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Preface 


This book is an introduction to metalogic, aimed especially at 
students of computer science and philosophy. “Metalogic” is so- 
called because it is the discipline that studies logic itself. Logic 
proper is concerned with canons of valid inference, and its sym- 
bolic or formal version presents these canons using formal lan- 
guages, such as those of propositional and first-order logic. Meta- 
logic investigates the properties of these languages, and of the 
canons of correct inference that use them. It studies topics such 
as how to give precise meaning to the expressions of these for- 
mal languages, how to justify the canons of valid inference, what 
the properties of various derivation systems are, including their 
computational properties. These questions are important and 
interesting in their own right, because the languages and deriva- 
tion systems investigated are applied in many different areas— 
in mathematics, philosophy, computer science, and linguistics, 
especially—but they also serve as examples of how to study for- 
mal systems in general. The logical languages we study here are 
not the only ones people are interested in. For instance, linguists 
and philosophers are interested in languages that are much more 
complicated than those of propositional and first-order logic, and 
computer scientists are interested in other kinds of languages 
altogether, such as programming languages. And the methods 
we discuss—how to give semantics for formal languages, how 
to prove results about formal languages, how to investigate the 
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properties of formal languages—are applicable in those cases as 
well. 

Like any discipline, metalogic both has a set of results or facts, 
and a store of methods and techniques, and this text covers both. 
Many students won’t need to know all of the results we discuss 
outside of this course, but they will need and use the methods 
we use to establish them. The Lé6wenheim-Skolem theorem, say, 
does not often make an appearance in computer science, but the 
methods we use to prove it do. On the other hand, many of the 
results we discuss do have relevance for certain debates, say, in the 
philosophy of science and in metaphysics. Philosophy students 
may not need to be able to prove these results outside this course, 
but they do need to understand what the results are—and you 
really only understand these results if you have thought through 
the definitions and proofs needed to establish them. These are, in 
part, the reasons for why the results and the methods covered in 
this text are recommended study—in some cases even required— 
for students of computer science and philosophy. 

The material is divided into three parts. Part I concerns it- 
self with the theory of sets. Logic and metalogic is historically 
connected very closely to what’s called the “foundations of math- 
ematics.” Mathematical foundations deal with how ultimately 
mathematical objects such as integers, rational, and real num- 
bers, functions, spaces, etc., should be understood. Set theory 
provides one answer (there are others), and so set theory and 
logic have long been studied side-by-side. Sets, relations, and 
functions are also ubiquitous in any sort of formal investigation, 
not just in mathematics but also in computer science and in some 
of the more technical corners of philosophy. Certainly for the pur- 
poses of formulating and proving results about the semantics and 
proof theory of logic and the foundation of computability it is es- 
sential to have a terminology in which to do this. For instance, 
we will talk about sets of expressions, relations of consequence 
and provability, interpretations of predicate symbols (which turn 
out to be relations), computable functions, and various relations 
between and constructions using them. It will be good to have 
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shorthand symbols for these, and think through the general prop- 
erties of sets, relations, and functions. If you are not used to think- 
ing mathematically and to formulating mathematical proofs, then 
think of the first part on set theory as a training ground: all the 
basic definitions will be given, and we’ll give increasingly compli- 
cated proofs using them. Note that understanding these proofs— 
and being able to find and formulate them yourself—is perhaps 
more important than understanding the results, especially in the 
first part. If mathematical thinking is new to you, it is important 
that you think through the examples and problems. 

In the first part we will establish one important result, how- 
ever. This result—Cantor’s theorem—relies on one of the most 
striking examples of conceptual analysis to be found anywhere 
in the sciences, namely, Cantor’s analysis of infinity. Infinity has 
puzzled mathematicians and philosophers alike for centuries. Un- 
til Cantor, no-one knew how to properly think about it. Many 
people even considered it a mistake to think about it at all, and 
believed that the notion of an infinite collection itself was incoher- 
ent. Cantor made infinity into a subject we can coherently work 
with, and developed an entire theory of infinite collections—and 
infinite numbers with which we can measure the sizes of infinite 
collections. He showed that there are different levels of infinity. 
This theory of “transfinite” numbers is beautiful and intricate, 
and we won’t get very far into it; but we will be able to show 
that there are different levels of infinity, specifically, that there 
are “countable” and “uncountable” levels of infinity. This result 
has important applications, but it is also really the kind of result 
that any self-respecting mathematician, computer scientist, and 
philosopher should know. 

In part II, we turn to first-order logic. We will define the lan- 
guage of first-order logic and its semantics, i.e., what first-order 
structures are and when a sentence of first-order logic is true in a 
structure. This will enable us to do two important things: (1) We 
can define, with mathematical precision, when a sentence is a 
logical consequence of another. (2) We can also consider how 
the relations that make up a first-order structure are described— 
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characterized—by the sentences that are true in them. This in 
particular leads us to a discussion of the axiomatic method, in 
which sentences of first-order languages are used to characterize 
certain kinds of structures. Proof theory will occupy us next, and 
we will consider the original version of the sequent calculus and 
natural deduction as defined in the 1930s by Gerhard Gentzen. 
(Your instructor may choose to cover only one, then any refer- 
ence to “derivations” and “derivability” will mean whatever sys- 
tem they chose.) The semantic notion of consequence and the 
syntactic notion of derivability give us two completely different 
ways to make precise the idea that a sentence may follow from 
some others. The soundness and completeness theorems link 
these two characterization. In particular, we will prove Gédel’s 
completeness theorem, which states that whenever a sentence is 
a semantic consequence of some others, then it is also derivable 
from them. An equivalent formulation is: if a collection of sen- 
tences is consistent—in the sense that nothing contradictory can 
be proved from them—then there is a structure that makes all of 
them true. 

The second formulation of the completeness theorem is per- 
haps the more surprising. Around the time Gédel proved this 
result (in 1929), the German mathematician David Hilbert fa- 
mously held the view that consistency (i.e., freedom from con- 
tradiction) is all that mathematical existence requires. In other 
words, whenever a mathematician can coherently describe a 
structure or class of structures, then they should be entitled to be- 
lieve in the existence of such structures. At the time, many found 
this idea preposterous: just because you can describe a struc- 
ture without contradicting yourself, it surely does not follow that 
such a structure actually exists. But that is exactly what Gédel’s 
completeness theorem says. In addition to this paradoxical— 
and certainly philosophically intriguing—aspect, the complete- 
ness theorem also has two important applications which allow us 
to prove further results about the existence of structures which 
make given sentences true. These are the compactness and the 
Léwenheim-Skolem theorems. 


PREFACE xvii 


In part III, we connect logic with computability. Again, there 
is a historical connection: David Hilbert had posed as a funda- 
mental problem of logic to find a mechanical method which would 
decide, of a given sentence of logic, whether it has a proof. Such 
a method exists, of course, for propositional logic: one just has 
to check all truth tables, and since there are only finitely many 
of them, the method eventually yields a correct answer. Such a 
straightforward method is not possible for first-order logic, since 
the number of possible structures is infinite (and structures them- 
selves may be infinite). Logicians were working to find a more 
ingenious methods for years. Alonzo Church and Alan Turing 
eventually established that there is no such method. In order to 
do this, it was necessary to first provide a precise definition of 
what a mechanical method is in general. If a decision procedure 
had been proposed, presumably it would have been recognized 
as an effective method. To prove that no effective method exists, 
you have to define “effective method” first and give an impossi- 
bility proof on the basis of that definition. This is what Turing 
did: he proposed the idea of a Turing machine’ as a mathemati- 
cal model of what a mechanical procedure can, in principle, do. 
This is another example of a conceptual analysis of an informal 
concept using mathematical machinery; and it is perhaps of the 
same order of importance for computer science as Cantor’s anal- 
ysis of infinity is for mathematics. Our last major undertaking 
will be the proof of two impossibility theorems: we will show that 
the so-called “halting problem” cannot be solved by Turing ma- 
chines, and finally that Hilbert’s “decision problem” (for logic) 
also cannot. 

This text is mathematical, in the sense that we discuss math- 
ematical definitions and prove our results mathematically. But it 
is not mathematical in the sense that you need extensive math- 
ematical background knowledge. Nothing in this text requires 
knowledge of algebra, trigonometry, or calculus. We have made 
a special effort to also not require any familiarity with the way 
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mathematics works: in fact, part of the point is to develop the kinds 
of reasoning and proof skills required to understand and prove 
our results. The organization of the text follows mathematical 
convention, for one reason: these conventions have been devel- 
oped because clarity and precision are especially important, and 
sO, e.g., it is critical to know when something is asserted as the 
conclusion of an argument, is offered as a reason for something 
else, or is intended to introduce new vocabulary. So we follow 
mathematical convention and label passages as “definitions” if 
they are used to introduce new terminology or symbols; and as 
“theorems,” “propositions,” “lemmas,” or “corollaries” when we 
record a result or finding. Other than these conventions, we will 
use the methods of logical proof that may already be familiar 
from a first logic course, and we will also make extensive use 
of the method of induction to prove results. Two chapters of the 
appendix are devoted to these proof methods. 


Notes for instructors The material in this book is suitable for 
a semester-long second course in formal logic. I cover it in 12 
weeks in Logic II taught at the University of Calgary, although I 
don’t cover everything in as much detail as there is in this book. 
For instance, I typically only talk about natural deduction, and 
leave out detailed proofs of completeness for identity. Students 
have taken Logic I, typically taught from forall x: Calgary, which 
uses the same natural deduction rules, except in Fitch format. 

The most recent version of this book is available in PDF at 
slc.openlogicproject.org, but changes frequently. The CC BY li- 
cense gives you the right to download and distribute the book 
yourself. In order to ensure that all your students have the same 
version of the book throughout the term you're using it, you 
should do so: upload the PDF you decide to use to your LMS 
rather than merely give your students the link. You are also free 
to have the PDFs printed by your bookstore, but some bookstores 
will be able to purchase and stock the softcover books available 
on Amazon. 
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The syntax, semantics, and proof systems for first-order logic 
are supported by Graham Leach-Krouses’s free, online logic 
teaching software application Carnap (carnap.io). This allows 
for submission and automated marking of exercises such as natu- 
ral deduction and sequent calculus derivations, giving structures 
for simple theories, and symbolization exercises. There is also a 
Turing machine simulator at turing.openlogicproject.org that can 
be used to illustrate the material in part III. The examples there 
are available pre-loaded in the simulator. 


Georg Cantor 
1845 - 1918 
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Relations, 
Functions 


CHAPTER 1 


Sets 


1.1 Extensionality 


A set is a collection of objects, considered as a single object. The 
objects making up the set are called elements or members of the 
set. If x is an element of a set a, we write x € a; if not, we write 
x ¢ a. The set which has no elements is called the empty set and 
denoted “0”. 

It does not matter how we specify the set, or how we order 
its elements, or indeed how many times we count its elements. 
All that matters are what its elements are. We codify this in the 
following principle. 


Definition 1.1 (Extensionality). If A and B are sets, then A = 


B iff every element of A is also an element of B, and vice versa. 


Extensionality licenses some notation. In general, when we 
have some objects aj, ..., dy, then {aj,...,a,} is the set whose 
elements are @,...,a@,. We emphasise the word “the”, since ex- 
tensionality tells us that there can be only one such set. Indeed, 
extensionality also licenses the following: 


{a,a,b} = {a,b} = {b, a}. 
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This delivers on the point that, when we consider sets, we don’t 
care about the order of their elements, or how many times they 
are specified. 


Example 1.2. Whenever you have a bunch of objects, you can 
collect them together in a set. The set of Richard’s siblings, for 
instance, is a set that contains one person, and we could write it as 
S' = {Ruth}. The set of positive integers less than 4 is {1,2,3}, but 
it can also be written as {3,2,1} or even as {1,2,1,2,3}. These are 
all the same set, by extensionality. For every element of {1,2,3} 
is also an element of {3,2,1} (and of {1,2,1,2,3}), and vice versa. 


Frequently we’ll specify a set by some property that its ele- 
ments share. We’ll use the following shorthand notation for that: 
{x : p(x)}, where the v(x) stands for the property that x has to 
have in order to be counted among the elements of the set. 


Example 1.3. In our example, we could have specified S also as 
S = {x:x is a sibling of Richard}. 


Example 1.4. A number is called perfect iff it is equal to the sum 
of its proper divisors (i.e., numbers that evenly divide it but aren’t 
identical to the number). For instance, 6 is perfect because its 
proper divisors are 1, 2, and 3, and 6 = 1+2+3. In fact, 6 is 
the only positive integer less than 10 that is perfect. So, using 
extensionality, we can say: 


{6} = {x : x is perfect and 0 < x < 10} 


We read the notation on the right as “the set of x’s such that x 
is perfect and 0 < x < 10”. The identity here confirms that, 
when we consider sets, we don’t care about how they are spec- 
ified. And, more generally, extensionality guarantees that there 
is always only one set of x’s such that (x). So, extensionality 
justifies calling {x : y(x)} the set of x’s such that y(x). 


Extensionality gives us a way for showing that sets are iden- 
tical: to show that A = B, show that whenever x € A then also 
x € B, and whenever y € B then also y € A. 
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1.2 Subsets and Power Sets 


We will often want to compare sets. And one obvious kind of 
comparison one might make is as follows: everything in one set is 
in the other too. This situation is sufficiently important for us to 
introduce some new notation. 


Definition 1.5 (Subset). If every element of a set A is also an el- 
ement of B, then we say that A is a subset of B, and write A C B. 


If A is not a subset of B we write A Z B. If A C B but A # B, we 
write A ¢ B and say that A is a proper subset of B. 


Example 1.6. Every set is a subset of itself, and @ is a subset of 
every set. The set of even numbers is a subset of the set of natural 
numbers. Also, {a,b} € {a,b,c}. But {a,b,e} is not a subset of 
{a,b,c}. 


Example 1.7. The number 2 is an element of the set of integers, 
whereas the set of even numbers is a subset of the set of integers. 
However, a set may happen to both be an element and a subset 
of some other set, e.g., {O} € {0,{0}} and also {0} C {0, {O}}. 


Extensionality gives a criterion of identity for sets: A = B 
iff every element of A is also an element of B and vice versa. 
The definition of “subset” defines A C B precisely as the first 
half of this criterion: every element of A is also an element of B. 
Of course the definition also applies if we switch A and B: that 
is, B C A iff every element of B is also an element of A. And 
that, in turn, is exactly the “vice versa” part of extensionality. In 
other words, extensionality entails that sets are equal iff they are 
subsets of one another. 


Now is also a good opportunity to introduce some further 
bits of helpful notation. In defining when A is a subset of B 
we said that “every element of A is ...,” and filled the “...” with 
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“an element of B”. But this is such a common shape of expression 
that it will be helpful to introduce some formal notation for it. 


Using this notation, we can say that A C B iff (Vx € A)x € B. 
Now we move on to considering a certain kind of set: the set 
of all subsets of a given set. 


Definition 1.10 (Power Set). The set consisting of all subsets 
of a set A is called the power set of A, written ¢(A). 


9(A) ={B:BcC A} 


Example 1.11. What are all the possible subsets of {a,b,c}? 
They are: 0, {a}, {b}, {c}, {a,b}, {a,c}, {b,c}, {a,b,c}. The 
set of all these subsets is ¢({a, b,c}): 


p({a,b,c}) = {0, {a}, {b}, te}, {a,b}, {b,c}, {a,c}, (a,b, ch} 


1.3 Some Important Sets 


Example 1.12. We will mostly be dealing with sets whose ele- 
ments are mathematical objects. Four such sets are important 
enough to have specific names: 


N = {0,1,2,3,...} 
the set of natural numbers 
Z = {...,-2,-1,0,1,2,...} 
the set of integers 
Q={m/n:m,ne Zand n # 0} 
the set of rationals 
R = (-00,00) 


the set of real numbers (the continuum) 
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These are all infinite sets, that is, they each have infinitely many 
elements. 

As we move through these sets, we are adding more numbers 
to our stock. Indeed, it should be clear that NC ZCQCR: 
after all, every natural number is an integer; every integer is a 
rational; and every rational is a real. Equally, it should be clear 
that NC Z¢ Q, since —1 is an integer but not a natural number, 
and 1/2 is rational but not integer. It is less obvious that Q ¢ R, 
i.e., that there are some real numbers which are not rational. 

We'll sometimes also use the set of positive integers Z* = 
{1,2,3,...} and the set containing just the first two natural num- 
bers B = {0,1}. 


Example 1.13 (Strings). Another interesting example is the set 
A* of finite strings over an alphabet A: any finite sequence of 
elements of A is a string over A. We include the empty string A 
among the strings over A, for every alphabet A. For instance, 


B* = {A,0,1,00,01,10,11, 
000, 001,010,011, 100, 101, 110, 111, 0000,.. .}. 


If x = x1...x, € A*is a string consisting of n “letters” from A, 
then we say length of the string is n and write len(x) = n. 


Example 1.14 (Infinite sequences). For any set A we may also 
consider the set A® of infinite sequences of elements of A. An 
infinite sequence a, a2a3a4... consists of a one-way infinite list of 
objects, each one of which is an element of A. 


1.4 Unions and Intersections 


In section 1.1, we introduced definitions of sets by abstraction, 
ie., definitions of the form {x : y(x)}. Here, we invoke some 
property y, and this property can mention sets we’ve already 
defined. So for instance, if A and B are sets, the set {x : x € 
AV x € B} consists of all those objects which are elements of either 
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Figure 1.1: The union A U B of two sets is set of elements of A together with 
those of B. 


A or B, i.e., it’s the set that combines the elements of A and B. 
We can visualize this as in Figure 1.1, where the highlighted area 
indicates the elements of the two sets A and B together. 

This operation on sets—combining them—is very useful and 
common, and so we give it a formal name and a symbol. 


Definition 1.15 (Union). The union of two sets A and B, writ- 
ten A U B, is the set of all things which are elements of A, B, or 


both. 
AUB={x:xE€AVx EB} 


Example 1.16. Since the multiplicity of elements doesn’t mat- 
ter, the union of two sets which have an element in common con- 
tains that element only once, e.g., {a,b,c}U{a,0,1} = {a,b,c,0,1}. 
The union of a set and one of its subsets is just the bigger set: 
{a,b,c} U {a} = {a,b,c}. 
The union of a set with the empty set is identical to the set: 
{a,b,c} UO = {a,b,c}. 


We can also consider a “dual” operation to union. This is the 
operation that forms the set of all elements that are elements of A 
and are also elements of B. This operation is called intersection, 
and can be depicted as in Figure 1.2. 


CHAPTER 1. SETS 8 


Figure 1.2: The intersection AN B of two sets is the set of elements they have in 
common. 


Definition 1.17 (Intersection). The intersection of two sets A 
and B, written A B, is the set of all things which are elements 
of both A and B. 


ANB={x:xE€AAxe€ B} 


Two sets are called disjoint if their intersection is empty. This 
means they have no elements in common. 


Example 1.18. If two sets have no elements in common, their 
intersection is empty: {a,b,c} {0,1} = 90. 

If two sets do have elements in common, their intersection is 
the set of all those: {a,b,c} N {a,b,d} = {a,b}. 

The intersection of a set with one of its subsets is just the 
smaller set: {a,b,c} M {a,b} = {a,b}. 

The intersection of any set with the empty set is empty: 
{a,b,c} NOD =9. 


We can also form the union or intersection of more than two 
sets. An elegant way of dealing with this in general is the follow- 
ing: suppose you collect all the sets you want to form the union 
(or intersection) of into a single set. Then we can define the union 
of all our original sets as the set of all objects which belong to at 


CHAPTER 1. SETS 9 


least one element of the set, and the intersection as the set of all 
objects which belong to every element of the set. 


Definition 1.19. If A is a set of sets, then \) A is the set of ele- 
ments of elements of A: 


| J4 = {x : x belongs to an element of A}, ice., 
= {x : there is a B € A so that x € B} 


Definition 1.20. If A is a set of sets, then () A is the set of objects 
which all elements of A have in common: 


() A= {x :x belongs to every element of A}, i-e., 
= {x : for all B € A,x € B} 


Example 1.21. Suppose A = {{a,b},{a,d,e},{a,d}}. Then 
LJ) A = {a,b,d,e} and () A = {a}. 


We could also do the same for a sequence of sets Aj, Ao, ... 


U A; = {x : x belongs to one of the A;} 
() A; = {x : x belongs to every A;}. 


When we have an index of sets, i.e., some set J such that 
we are considering A; for each i € J, we may also use these 


abbreviations: 
Ja =| Jt: er € I} 


iel 


()4:=( f4iri ed 


ier 


Finally, we may want to think about the set of all elements 
in A which are not in B. We can depict this as in Figure 1.3. 
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Figure 7.3: The difference A \ B of two sets is the set of those elements of A 
which are not also elements of B. 


Definition 1.22 (Difference). The set difference A \ B is the set 
of all elements of A which are not also elements of B, i.e., 


A\ B={x:x¢€Aand x ¢ B}. 


1.5 Pairs, Tuples, Cartesian Products 


It follows from extensionality that sets have no order to their 
elements. So if we want to represent order, we use ordered pairs 
(x,y). In an unordered pair {x,y}, the order does not matter: 
{x,y} = {y,x}. In an ordered pair, it does: if x # y, then (x,y) # 
(ik) 

How should we think about ordered pairs in set theory? Cru- 
cially, we want to preserve the idea that ordered pairs are iden- 
tical iff they share the same first element and share the same 
second element, i.e.: 


(a,b) = (c,d) iff both a =c and b= d. 


We can define ordered pairs in set theory using the Wiener- 
Kuratowski definition. 


CHAPTER 1. SETS 11 


Definition 1.23 (Ordered pair). (a,b) = {{a},{a,b}}. 


Having fixed a definition of an ordered pair, we can use it 
to define further sets. For example, sometimes we also want or- 
dered sequences of more than two objects, e.g., triples (x,y,z), 
quadruples (x,y,z,u), and so on. We can think of triples as spe- 
cial ordered pairs, where the first element is itself an ordered pair: 
(x,y,z) is ({x,y),z). The same is true for quadruples: (x,y,z, u) 
is (((x,y),z),u), and so on. In general, we talk of ordered n-tuples 
(X1,..-5Xn) 

Certain sets of ordered pairs, or other ordered n-tuples, will 
be useful. 


Definition 1.24 (Cartesian product). Given sets A and B, 
their Cartesian product A x B is defined by 


Ax B= {(x,y):x€Aand ye B}. 


Example 1.25. If A = {0,1}, and B = {1,a,5}, then their prod- 


uct is 
Ax B= {(0,1), (0, a), (0, 6), (1,1), (1, a), (1, 6) }. 


Example 1.26. If A is a set, the product of A with itself, A x A, 
is also written A’. It is the set of all pairs (x,y) with x,y € A. The 
set of all triples (x,y,z) is A®, and so on. We can give a recursive 
definition: 
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Proof. For every element x in A, there are m elements of the form 
(x,y) € AX B. Let By = {(x,y) : y € B}. Since whenever x1 # x2, 
(x19) # (x9,9), By 1 By = 0. But if A = {x1,...,x%,}, then 
Ax B= B,, U---U By, and so has n- m elements. 

To visualize this, arrange the elements of A x B in a grid: 


By = {(x1,91) (1,92) «--  (¥1, Pm) F 
Byy = {(x2,.91) (x9,92) neh. (x2, Ym) t 
Be, = (tay) Gtnyo) --. (tm dn)) 


Since the «x; are all different, and the y, are all different, no two of 
the pairs in this grid are the same, and there are n- m of them.O 


Example 1.28. If A is a set, a word over A is any sequence of 
elements of A. A sequence can be thought of as an n-tuple of ele- 
ments of A. For instance, if A = {a,b,c}, then the sequence “bac” 
can be thought of as the triple (b,a,c). Words, i.e., sequences of 
symbols, are of crucial importance in computer science. By con- 
vention, we count elements of A as sequences of length 1, and 0 
as the sequence of length 0. The set of all words over A then is 


A ={O}UAUAUA DV... 


1.6 Russell’s Paradox 


Extensionality licenses the notation {x : y(x)}, for the set of x’s 
such that v(x). However, all that extensionality really licenses is 
the following thought. Jf there is a set whose members are all 
and only the y’s, then there is only one such set. Otherwise put: 
having fixed some 4, the set {x : y(x)} is unique, if it exists. 

But this conditional is important! Crucially, not every prop- 
erty lends itself to comprehension. That is, some properties do not 
define sets. If they all did, then we would run into outright contra- 
dictions. The most famous example of this is Russell’s Paradox. 
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Sets may be elements of other sets—for instance, the power 
set of a set A is made up of sets. And so it makes sense to ask or 
investigate whether a set is an element of another set. Can a set 
be a member of itself? Nothing about the idea of a set seems to 
rule this out. For instance, if all sets form a collection of objects, 
one might think that they can be collected into a single set—the 
set of all sets. And it, being a set, would be an element of the set 
of all sets. 

Russell’s Paradox arises when we consider the property of not 
having itself as an element, of being non-self-membered. What if we 
suppose that there is a set of all sets that do not have themselves 
as an element? Does 


R={x:x¢x} 


exist? It turns out that we can prove that it does not. 


Proof. If R = {x : x ¢ x} exists, then R € R iff R ¢ R, which is a 


contradiction. im 


Let’s run through this proof more slowly. If R exists, it makes 
sense to ask whether R € R or not. Suppose that indeed R € R. 
Now, R was defined as the set of all sets that are not elements of 
themselves. So, if R ¢ R, then R does not itself have R’s defining 
property. But only sets that have this property are in R, hence, R 
cannot be an element of R, i.e., R ¢ R. But R can’t both be and 
not be an element of R, so we have a contradiction. 

Since the assumption that R € R leads to a contradiction, we 
have R ¢ R. But this also leads to a contradiction! For if R ¢ R, 
then R itself does have R’s defining property, and so R would be 
an element of R just like all the other non-self-membered sets. 
And again, it can’t both not be and be an element of R. 

How do we set up a set theory which avoids falling into Rus- 
sell’s Paradox, i.e., which avoids making the inconsistent claim that 
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R={x:x ¢ x} exists? Well, we would need to lay down axioms 
which give us very precise conditions for stating when sets exist 
(and when they don’t). 

The set theory sketched in this chapter doesn’t do this. It’s 
genuinely naive. It tells you only that sets obey extensionality and 
that, if you have some sets, you can form their union, intersection, 
etc. It is possible to develop set theory more rigorously than 
this. 


Summary 


A set is a collection of objects, the elements of the set. We write 
x € A if x is an element of A. Sets are extensional—they are 
completely determined by their elements. Sets are specified by 
listing the elements explicitly or by giving a property the ele- 
ments share (abstraction). Extensionality means that the order 
or way of listing or specifying the elements of a set doesn’t mat- 
ter. To prove that A and B are the same set (A = B) one has to 
prove that every element of X is an element of Y and vice versa. 

Important sets include the natural (N), integer (Z), rational 
(Q), and real (R) numbers, but also strings (X*) and infinite 
sequences (X“) of objects. A is a subset of B, A C B, if every 
element of A is also one of B. The collection of all subsets of 
a set B is itself a set, the power set 9(B) of B. We can form 
the union A U B and intersection 41 B of sets. An ordered 
pair (x,y) consists of two objects x and y, but in that specific 
order. The pairs (x,y) and (y,x) are different pairs (unless x = y). 
The set of all pairs (x,y) where x € A and y € B is called the 
Cartesian product A x B of A and B. We write A’ for Ax A; so 
for instance N? is the set of pairs of natural numbers. 


Problems 


Problem 1.1. Prove that there is at most one empty set, i.e., 
show that if A and B are sets without elements, then A = B. 
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Problem 1.2. List all subsets of {a,b,c,d}. 


Problem 1.3. Show that if A has n elements, then g(A) has 2” 
elements. 


Problem 1.4. Prove that if A C B, then AU B= B. 

Problem 1.5. Prove rigorously that if A C B, then AN B= A. 
Problem 1.6. Show that if A is a set and A € B, then AC UB. 
Problem 1.7. Prove that if A € B, then B\ A #0. 


Problem 1.8. Using Definition 1.23, prove that (a,b) = (c,d) iff 
both a=c and b=d. 


Problem 1.9. List all elements of 123), 


Problem 1.10. Show, by induction on fk, that for all k > 1, if A 
has n elements, then A* has n* elements. 


CHAPTER 2 


Relations 


2.1 Relations as Sets 


In section 1.3, we mentioned some important sets: N, Z, Q, R. 
You will no doubt remember some interesting relations between 
the elements of some of these sets. For instance, each of these sets 
has a completely standard order relation on it. There is also the 
relation is identical with that every object bears to itself and to no 
other thing. There are many more interesting relations that we'll 
encounter, and even more possible relations. Before we review 
them, though, we will start by pointing out that we can look at 
relations as a special sort of set. 

For this, recall two things from section 1.5. First, recall the 
notion of a ordered pair: given a and b, we can form (a,b). Im- 
portantly, the order of elements does matter here. So if a # 6 
then (a,b) # (b,a). (Contrast this with unordered pairs, i.e., 2- 
element sets, where {a,b} = {b,a}.) Second, recall the notion of 
a Cartesian product: if A and B are sets, then we can form A x B, 
the set of all pairs (x,y) with x ¢ A and y ¢€ B. In particular, 
A? = Ax A is the set of all ordered pairs from A. 

Now we will consider a particular relation on a set: the <- 
relation on the set N of natural numbers. Consider the set of all 
pairs of numbers (n,m) where n < m, i.e., 


R= {(n,m): n,m €WN and n < m}. 


16 
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There is a close connection between n being less than m, and the 
pair (n,m) being a member of R, namely: 


n < m iff (n,m) € R. 


Indeed, without any loss of information, we can consider the set 
R to be the <-relation on N. 

In the same way we can construct a subset of N? for any rela- 
tion between numbers. Conversely, given any set of pairs of num- 
bers S C N?, there is a corresponding relation between numbers, 
namely, the relationship n bears to m if and only if (n,m) € S. 
This justifies the following definition: 


Definition 2.1 (Binary relation). A binary relation on a set A is 
a subset of A”. If R ¢ A? is a binary relation on A and x,y € A, 


we sometimes write Rxy (or xRy) for (x,y) € R. 


Example 2.2. The set N? of pairs of natural numbers can be 
listed in a 2-dimensional matrix like this: 


(0,0) (0,1) (0,2) (0,3) 
(1,0) <1,1) (1,2) (1,3) 
(2,0) (2,1) (2,2) (2,3) 
(3,0) (3,1) (3,2) (3,3) 


We have put the diagonal, here, in bold, since the subset of N? 
consisting of the pairs lying on the diagonal, i.e., 


160; 0) el) G2 2a ats 


is the identity relation on N. (Since the identity relation is popular, 
let’s define Id4 = {(x,x) : x € A} for any set A.) The subset of all 
pairs lying above the diagonal, i.e., 


L = {(0,1), (0,2),...,(1,2),(1,3),.-.,¢2,3), (2,4),...}, 
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is the less than relation, i.e., Lum iff n < m. The subset of pairs 
below the diagonal, i.e., 


G = {(1,0), (2,0), (2,1), (3,0), (3,1), (3,2),...}, 


is the greater than relation, i.e., Gnm iff n > m. The union of L 
with J, which we might call K = L UJ, is the less than or equal to 
relation: Knm iff n < m. Similarly, H = G UT is the greater than 
or equal to relation. These relations L, G, K, and H are special 
kinds of relations called orders. L and G have the property that 
no number bears L or G to itself (i.e., for all x, neither Lnn nor 
Gnn). Relations with this property are called irreflexive, and, if 
they also happen to be orders, they are called strict orders. 


Although orders and identity are important and natural re- 
lations, it should be emphasized that according to our defini- 
tion any subset of A? is a relation on A, regardless of how un- 
natural or contrived it seems. In particular, @ is a relation on 
any set (the empty relation, which no pair of elements bears), 
and A? itself is a relation on A as well (one which every pair 
bears), called the universal relation. But also something like 
E={(n,m):n> 5 or mx n > 34} counts as a relation. 


2.2 Special Properties of Relations 


Some kinds of relations turn out to be so common that they have 
been given special names. For instance, < and C both relate their 
respective domains (say, N in the case of < and (A) in the case 
of C) in similar ways. To get at exactly how these relations are 
similar, and how they differ, we categorize them according to 
some special properties that relations can have. It turns out that 
(combinations of) some of these special properties are especially 
important: orders and equivalence relations. 
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Definition 2.3 (Reflexivity). A relation R C A? is reflexive iff, 
for every x € A, Rxx. 


Definition 2.4 (Transitivity). A relation R C A? is transitive iff, 
whenever Rxy and Ryz, then also Rxz. 


Definition 2.5 (Symmetry). A relation R C A? is symmetric iff, 
whenever Rxy, then also Ryx. 


Definition 2.6 (Anti-symmetry). A relation R C A? is anti-sym- 
metric iff, whenever both Rxy and Ryx, then x = y (or, in other 
words: if x # y then either ~Rxy or —Ryx). 


In a symmetric relation, Rxy and Ryx always hold together, 
or neither holds. In an anti-symmetric relation, the only way for 
Rxy and Ryx to hold together is if x = y. Note that this does not 
require that Rxy and Ryx holds when x = y, only that it isn’t ruled 
out. So an anti-symmetric relation can be reflexive, but it is not 
the case that every anti-symmetric relation is reflexive. Also note 
that being anti-symmetric and merely not being symmetric are 
different conditions. In fact, a relation can be both symmetric 
and anti-symmetric at the same time (e.g., the identity relation 
is). 


Definition 2.7 (Connectivity). A relation R C A? is connected 
if for all x,y € A, if x # y, then either Rxy or Ryx. 


Definition 2.8 (Irreflexivity). A relation R C A? is called ir- 
reflexive if, for all x € A, not Rxx. 
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Definition 2.9 (Asymmetry). A relation R C A? is called asym- 


metric if for no pair x,y € A we have both Rxy and Ryx. 


Note that if A # 0, then no irreflexive relation on A is reflex- 
ive and every asymmetric relation on A is also anti-symmetric. 
However, there are R C A? that are not reflexive and also not 
irreflexive, and there are anti-symmetric relations that are not 
asymmetric. 


2.3 Equivalence Relations 


The identity relation on a set is reflexive, symmetric, and transi- 
tive. Relations R that have all three of these properties are very 
common. 


Definition 2.10 (Equivalence relation). A relation R C A’ 


that is reflexive, symmetric, and transitive is called an equivalence 
relation. Elements x and y of A are said to be R-equivalent if Rxy. 


Equivalence relations give rise to the notion of an equivalence 
class. An equivalence relation “chunks up” the domain into differ- 
ent partitions. Within each partition, all the objects are related 
to one another; and no objects from different partitions relate 
to one another. Sometimes, it’s helpful just to talk about these 
partitions directly. To that end, we introduce a definition: 


Definition 2.11. Let R ¢ A” be an equivalence relation. For 
each x € A, the equivalence class of x in A is the set [x]z = {y € 


A: Rxy}. The quotient of A under R is A/r= {[x]pz : x € A}, ie., 
the set of these equivalence classes. 


The next result vindicates the definition of an equivalence 
class, in proving that the equivalence classes are indeed the par- 
titions of A: 
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Proof: For the left-to-right direction, suppose Rxy, and let z € 
[x]r. By definition, then, Rxz. Since R is an equivalence relation, 
Ryz. (Spelling this out: as Rxy and R is symmetric we have 
Ryx, and as Rxz and R is transitive we have Ryz.) So z € [y]r. 
Generalising, [x]z C [y]r. But exactly similarly, Ly]r C [x]r. So 
[x]r = Ly], by extensionality. 

For the right-to-left direction, suppose [x]r = Ly]z. Since R is 
reflexive, Ryy, so y € [y]r. Thus also y € [x]pz by the assumption 
that [x]r = [y]r. So Rxy. Oo 


Example 2.13. A nice example of equivalence relations comes 
from modular arithmetic. For any a, b, and n € N, say that a =, b 
iff dividing a by n gives the same remainder as dividing b by n. 
(Somewhat more symbolically: a =, 5 iff, for some k € Z, a—b = 
kn.) Now, =, is an equivalence relation, for any n. And there 
are exactly n distinct equivalence classes generated by =,; that 
is, N/z, has n elements. These are: the set of numbers divisible 
by n without remainder, i.e., [O]=,; the set of numbers divisible 
by n with remainder 1, i.e., [1]=,; ...; and the set of numbers 
divisible by n with remainder n — 1, ie., [n —1]=,. 


=n 


2.4 Orders 


Many of our comparisons involve describing some objects as be- 
ing “less than”, “equal to”, or “greater than” other objects, in a 
certain respect. These involve order relations. But there are differ- 
ent kinds of order relations. For instance, some require that any 
two objects be comparable, others don’t. Some include identity 
(like <) and some exclude it (like <). It will help us to have a 


taxonomy here. 
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Definition 2.14 (Preorder). A relation which is both reflexive 
and transitive is called a preorder. 


Definition 2.15 (Partial order). A preorder which is also anti- 
symmetric is called a partial order. 


Definition 2.16 (Linear order). A partial order which is also 
connected is called a total order or linear order. 


Example 2.17. Every linear order is also a partial order, and 
every partial order is also a preorder, but the converses don’t 
hold. The universal relation on A is a preorder, since it is reflexive 
and transitive. But, if A has more than one element, the universal 
relation is not anti-symmetric, and so not a partial order. 


Example 2.18. Consider the no longer than relation < on B*: x < 
y iff len(x) < len(y). This is a preorder (reflexive and transitive), 
and even connected, but not a partial order, since it is not anti- 
symmetric. For instance, 01 < 10 and 10 < 01, but 01 # 10. 


Example 2.19. An important partial order is the relation C ona 
set of sets. This is not in general a linear order, since if a # b and 
we consider ¢({a,5}) = {0, {a}, {b}, {a,b}}, we see that {a} ¢ {bd} 
and {a} # {b} and {b} ¢ {a}. 


Example 2.20. The relation of divisibility without remainder gives 
us a partial order which isn’t a linear order. For integers n, m, we 
write n | m to mean n (evenly) divides m, i.e., iff there is some 
integer k so that m = kn. On N, this is a partial order, but not 
a linear order: for instance, 2 { 3 and also 3 { 2. Considered 
as a relation on Z, divisibility is only a preorder since it is not 
anti-symmetric: 1 | —1 and -1| 1 but 1 # -1. 
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Definition 2.21 (Strict order). A strict order is a relation which 
is irreflexive, asymmetric, and transitive. 


Definition 2.22 (Strict linear order). A strict order which is 
also connected is called a strict linear order. 


Example 2.23. < is the linear order corresponding to the strict 
linear order <. C is the partial order corresponding to the strict 
order C. 


Definition 2.24 (Total order). A strict order which is also con- 
nected is called a total order. This is also sometimes called a strict 
linear order. 


Any strict order R on A can be turned into a partial order by 
adding the diagonal Idy, i.e., adding all the pairs (x,x). (This 
is called the reflexive closure of R.) Conversely, starting from a 
partial order, one can get a strict order by removing Idy. These 
next two results make this precise. 


Proof. Suppose R is a strict order, i.e., R C A” and R is irreflexive, 
asymmetric, and transitive. Let R* = R UIdy. We have to show 
that R* is reflexive, antisymmetric, and transitive. 

R* is clearly reflexive, since (x,x) € Id4 C R* for all x € A. 

To show R* is antisymmetric, suppose for reductio that R* xy 
and R*yx but x # y. Since (x,y) € RUIdy, but (x,y) ¢ Idx, we 
must have (x,y) € R, ie., Rxy. Similarly, Ryx. But this contra- 
dicts the assumption that R is asymmetric. 

To establish transitivity, suppose that R*xy and R* yz. If both 
(x,y) € R and (y,z) € R, then (x,z) € R since R is transitive. 
Otherwise, either (x,y) € Idy, ie., x = y, or (y,z) € Idy, ie., 
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y = z. In the first case, we have that R*yz by assumption, x = y, 
hence R*xz. Similarly in the second case. In either case, R* xz, 
thus, R* is also transitive. 

Concerning the “moreover” clause, suppose R is a total order, 
i.e., that R is connected. So for all x # y, either Rxy or Ryx, ie., 
either (x,y) € R or (y,x) € R. Since R C R*, this remains true of 
R*, so R* is connected as well. o 


Proof: This is left as an exercise. oO 


Example 2.27. < is the linear order corresponding to the total 
order <. C is the partial order corresponding to the strict order C. 


The following simple result which establishes that total orders 
satisfy an extensionality-like property: 


Proof. Suppose (Wx € A)(x < ax < bd). Ifa < b, then a < a, 
contradicting the fact that < is irreflexive; so a ¢ b. Exactly 
similarly, b ¢ a. So a = b, as < is connected. Oo 


2.5 Graphs 


A graph is a diagram in which points—called “nodes” or “ver- 
tices” (plural of “vertex”)—are connected by edges. Graphs are 
a ubiquitous tool in discrete mathematics and in computer sci- 
ence. They are incredibly useful for representing, and visualizing, 
relationships and structures, from concrete things like networks 
of various kinds to abstract structures such as the possible out- 
comes of decisions. There are many different kinds of graphs in 
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the literature which differ, e.g., according to whether the edges 
are directed or not, have labels or not, whether there can be edges 
from a node to the same node, multiple edges between the same 
nodes, etc. Directed graphs have a special connection to relations. 


Definition 2.29 (Directed graph). A directed graph G = (V,E) 


is a set of vertices V and a set of edges E C V?. 


According to our definition, a graph just is a set together with 
a relation on that set. Of course, when talking about graphs, it’s 
only natural to expect that they are graphically represented: we 
can draw a graph by connecting two vertices v; and v2 by an 
arrow iff (v},v2) € E. The only difference between a relation by 
itself and a graph is that a graph specifies the set of vertices, i.e., a 
graph may have isolated vertices. The important point, however, 
is that every relation R ona set X can be seen as a directed graph 
(X,R), and conversely, a directed graph (V, £) can be seen as a 
relation E C V? with the set V explicitly specified. 


Example 2.30. The graph (V,£) with V = {1,2,3,4} and E = 
{(1,1), (1,2), (1, 3), (2,3)} looks like this: 


©) 
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This is a different graph than (V’,£) with V’ = {1,2,3}, which 
looks like this: 


2.6 Operations on Relations 


It is often useful to modify or combine relations. In Proposi- 
tion 2.25, we considered the union of relations, which is just the 
union of two relations considered as sets of pairs. Similarly, in 
Proposition 2.26, we considered the relative difference of rela- 
tions. Here are some other operations we can perform on rela- 
tions. 


Definition 2.31. Let R, S be relations, and A be any set. 
The inverse of R is R-! = {(y, x) : (x,y) € R}. 
The relative product of R and § is (R | S) = {(x,z) : dy(Rxy A 


Syz)}. 
The restriction of R to Ais Rt y= RN A?. 
The application of R to Ais R[ A] = {y: (Ax € A)Rxy} 


Example 2.32. Let S' C Z? be the successor relation on Z, i.e., 
S = {(x,y) € Z?:x+1=y}, so that Sxy iff x+1=y. 
S~1 is the predecessor relation on Z, i.e., {(x,y) € Z?:x-1= 
Jy}. 
S| Sis {(x,y) € Z2:x+2=y} 
Sf is the successor relation on N. 
S[{1,2,3}] is {2,3,4}. 
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Definition 2.33 (Transitive closure). Let R C A’ be a binary 
relation. 
The transitive closure of R is R* = penen R", where we recur- 


sively define R! = R and R™*1 = R" | R. 
The reflexive transitive closure of R is R* = R* UIdy. 


Example 2.34. Take the successor relation § ¢ Z?. S?xy iff x + 
2=y, Sexy iff x+3 = y, etc. So S*xy iff x +n = y for some n > 1. 
In other words, S*xy iff x < y, and S*xy iff x < y. 


Summary 


A relation R on a set A is a way of relating elements of A. We 
write Rxy if the relation holds between x and y. Formally, we can 
consider R as the sets of pairs (x,y) € A? such that Rxy. Being 
less than, greater than, equal to, evenly dividing, being the same 
length as, a subset of, and the same size as are all important 
examples of relations (on sets of numbers, strings, or of sets). 
Graphs are a general way of visually representing relations. But 
a graph can also be seen as a binary relation (the edge relation) 
together with the underlying set of vertices. 

Some relations share certain features which makes them espe- 
cially interesting or useful. A relation R is reflexive if everything 
is R-related to itself; symmetric, if with Rxy also Ryx holds for 
any x and y; and transitive if Rxy and Ryz guarantees Rxz. Re- 
lations that have all three of these properties are equivalence 
relations. A relation is anti-symmetric if Rxy and Ryx guaran- 
tees x = y. Partial orders are those relations that are reflexive, 
anti-symmetric, and transitive. A linear order is any partial or- 
der which satisfies that for any x and y, either Rxy or x = y or 
Ryx. (Generally, a relation with this property is connected). 

Since relations are sets (of pairs), they can be operated on as 
sets (e.g., we can form the union and intersection of relations). 
We can also chain them together (relative product R | S). If we 


CHAPTER 2. RELATIONS 28 


form the relative product of R with itself arbitrarily many times 
we get the transitive closure R* of R. 


Problems 


Problem 2.1. List the elements of the relation € on the set 


o({a,b,c}). - 


Problem 2.2. Give examples of relations that are (a) reflex- 
ive and symmetric but not transitive, (b) reflexive and anti- 
symmetric, (c) anti-symmetric, transitive, but not reflexive, and 
(d) reflexive, symmetric, and transitive. Do not use relations on 
numbers or sets. 


Problem 2.3. Show that =, is an equivalence relation, for any 
n € N, and that N/z, has exactly n members. 


Problem 2.4. Give a proof of Proposition 2.26. 


Problem 2.5. Consider the less-than-or-equal-to relation < on 


the set {1,2,3,4} as a graph and draw the corresponding dia- 
gram. 


Problem 2.6. Show that the transitive closure of R is in fact tran- 
sitive. 


CHAPTER 3 


Functions 


3.1 Basics 


A function is a map which sends each element of a given set to a 
specific element in some (other) given set. For instance, the op- 
eration of adding 1 defines a function: each number n is mapped 
to a unique number 7 + 1. 

More generally, functions may take pairs, triples, etc., as in- 
puts and return some kind of output. Many functions are familiar 
to us from basic arithmetic. For instance, addition and multipli- 
cation are functions. They take in two numbers and return a 
third. 

In this mathematical, abstract sense, a function is a black box: 
what matters is only what output is paired with what input, not 
the method for calculating the output. 


Definition 3.1 (Function). A function f: A > B is a mapping 
of each element of A to an element of B. 

We call A the domain of f and B the codomain of f. The 
elements of A are called inputs or arguments of f, and the element 


of B that is paired with an argument x by / is called the value 
of f for argument x, written f(x). 

The range ran(/) of f is the subset of the codomain consisting 
of the values of f for some argument; ran(f) = {f(«) : x € A}. 


29 
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Figure 3.1: A function is a mapping of each element of one set to an element of 
another. An arrow points from an argument in the domain to the corresponding 
value in the codomain. 


The diagram in Figure 3.1 may help to think about functions. 
The ellipse on the left represents the function’s domain; the el- 
lipse on the right represents the function’s codomain; and an ar- 
row points from an argument in the domain to the corresponding 
value in the codomain. 


Example 3.2. Multiplication takes pairs of natural numbers as 
inputs and maps them to natural numbers as outputs, so goes 
from N x N (the domain) to N (the codomain). As it turns out, 
the range is also N, since every n € Nis nx 1. 


Example 3.3. Multiplication is a function because it pairs each 
input—each pair of natural numbers—with a single output: 
x: N? +N. By contrast, the square root operation applied to 
the domain N is not functional, since each positive integer n has 
two square roots: Vn and —Vn. We can make it functional by 
only returning the positive square root: V :N—>R. 


Example 3.4. The relation that pairs each student in a class with 
their final grade is a function—no student can get two different 
final grades in the same class. The relation that pairs each student 
in a class with their parents is not a function: students can have 
zero, or two, or more parents. 


We can define functions by specifying in some precise way 
what the value of the function is for every possible argument. 
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Different ways of doing this are by giving a formula, describing 
a method for computing the value, or listing the values for each 
argument. However functions are defined, we must make sure 
that for each argument we specify one, and only one, value. 


Example 3.5. Let f: N — N be defined such that f(x) = x +1. 
This is a definition that specifies f as a function which takes in 
natural numbers and outputs natural numbers. It tells us that, 
given a natural number x, f will output its successor x + 1. In 
this case, the codomain N is not the range of f, since the natural 
number 0 is not the successor of any natural number. The range 
of f is the set of all positive integers, Z*. 


Example 3.6. Let g: N — Nbe defined such that g(x) = x+2-1. 
This tells us that g is a function which takes in natural numbers 
and outputs natural numbers. Given a natural number 2, g will 
output the predecessor of the successor of the successor of x, i.e., 
xt+1. 


We just considered two functions, f and g, with different def- 
initions. However, these are the same function. After all, for any 
natural number 2, we have that f(m) =n+1l=n+2-1=g(n). 
Otherwise put: our definitions for f and g specify the same map- 
ping by means of different equations. Implicitly, then, we are 
relying upon a principle of extensionality for functions, 


if Vx f(x) = g(x), then f = g 
provided that f and g share the same domain and codomain. 


Example 3.7. We can also define functions by cases. For in- 
stance, we could define h: N — N by 


x if x is even 
h(x) = %? 
(») ( if x is odd. 


Since every natural number is either even or odd, the output of 
this function will always be a natural number. Just remember that 


CHAPTER 3. FUNCTIONS 32 


@— 
7A 


Figure 3.2: A surjective function has every element of the codomain as a value. 


if you define a function by cases, every possible input must fall 
into exactly one case. In some cases, this will require a proof that 
the cases are exhaustive and exclusive. 


3.2 Kinds of Functions 


It will be useful to introduce a kind of taxonomy for some of the 
kinds of functions which we encounter most frequently. 

To start, we might want to consider functions which have the 
property that every member of the codomain is a value of the 
function. Such functions are called surjective, and can be pic- 
tured as in Figure 3.2. 


Definition 3.8 (Surjective function). A function f: A > B is 
surjective iff B is also the range of f, i-e., for every y € B there is 
at least one x € A such that f(x) = y, or in symbols: 


(Vy € B)(Ax € A)f (x) =. 


We call such a function a surjection from A to B. 


If you want to show that f is a surjection, then you need to 
show that every object in f’s codomain is the value of f(x) for 
some input x. 

Note that any function induces a surjection. After all, given a 
function f: A — B, let f’: A — ran(f) be defined by f’(x) = 


CHAPTER 3. FUNCTIONS 33 


ae 


<< 


Figure 3.3: An injective function never maps two different arguments to the 
same value. 


f(x). Since ran(f) is defined as {f (x) € B: x € A}, this function 
f’ is guaranteed to be a surjection 

Now, any function maps each possible input to a unique out- 
put. But there are also functions which never map different inputs 
to the same outputs. Such functions are called injective, and can 
be pictured as in Figure 3.3. 


Definition 3.9 (Injective function). A function f: A — B is 
injective iff for each y € B there is at most one x € A such 


that f(x) =y. We call such a function an injection from A to B. 


If you want to show that f is an injection, you need to show 
that for any elements x and y of f’s domain, if f(x) = f(y), then 
xK=y. 

Example 3.10. The constant function f: N — N given by 
f(«*) =1is neither injective, nor surjective. 

The identity function f: N — N given by f(x) = x is both 
injective and surjective. 

The successor function f: N — N given by f(x) = x +1 is 
injective but not surjective. 


The function f: N — N defined by: 


a if x is even 
x)= 
fF) io if x is odd. 


is surjective, but not injective. 
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Figure 3.4: A bijective function uniquely pairs the elements of the codomain 
with those of the domain. 


Often enough, we want to consider functions which are both 
injective and surjective. We call such functions bijective. They 
look like the function pictured in Figure 3.4. Bijections are also 
sometimes called one-to-one correspondences, since they uniquely 
pair elements of the codomain with elements of the domain. 


Definition 3.11 (Bijection). A function f: A — B is bijective 


iff it is both surjective and injective. We call such a function 
a bijection from A to B (or between A and B). 


3.3 Functions as Relations 


A function which maps elements of A to elements of B obviously 
defines a relation between A and B, namely the relation which 
holds between x and » iff f(x) = y. In fact, we might even—if we 
are interested in reducing the building blocks of mathematics for 
instance—identify the function f with this relation, i.e., with a 
set of pairs. This then raises the question: which relations define 
functions in this way? 


Definition 3.12 (Graph of a function). Let f: A — B bea 
function. The graph of f is the relation Rr ¢ A x B defined 
by 


Rye = {(x,9) : f(x) = y}- 
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The graph of a function is uniquely determined, by extension- 
ality. Moreover, extensionality (on sets) will immediately vindi- 
cate the implicit principle of extensionality for functions, whereby 
if f and g share a domain and codomain then they are identical 
if they agree on all values. 

Similarly, if a relation is “functional”, then it is the graph of 


a function. 


Proof. Suppose there is a y such that Rxy. If there were another 
z # y such that Rxz, the condition on R would be violated. 
Hence, if there is a y such that Rxy, this y is unique, and so 
f is well-defined. Obviously, Rr = R. Oo 


Every function f: A — B has a graph, ie., a relation on AX B 
defined by f(x) = y. On the other hand, every relation R C AXB 
with the properties given in Proposition 3.13 is the graph of a 
function f: A — B. Because of this close connection between 
functions and their graphs, we can think of a function simply as 
its graph. In other words, functions can be identified with certain 
relations, i.e., with certain sets of tuples. We can now consider 
performing similar operations on functions as we performed on 
relations (see section 2.6). In particular: 


Definition 3.14. Let f: A — B be a function with C' C A. 
The restriction of f to C is the function fl¢: C — B defined 


by (f lc)(*) = f(x) for all x € C. In other words, fc = {(x,y) € 
Rei xe Ch. 
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It follows from these definitions that ran(f) = f[dom(/)], 
for any function f. These notions are exactly as one would ex- 
pect, given the definitions in section 2.6 and our identification of 
functions with relations. But two other operations—inverses and 
relative products—require a little more detail. We will provide 
that in section 3.4 and section 3.5. 


3.4. Inverses of Functions 


We think of functions as maps. An obvious question to ask about 
functions, then, is whether the mapping can be “reversed.” For 
instance, the successor function f(x) = x + 1 can be reversed, in 
the sense that the function g(y) = y — 1 “undoes” what f does. 

But we must be careful. Although the definition of g defines 
a function Z — Z, it does not define a function N — N, since 
g(0) ¢ N. So even in simple cases, it is not quite obvious whether 
a function can be reversed; it may depend on the domain and 
codomain. 

This is made more precise by the notion of an inverse of a 
function. 


Definition 3.15. A function g: B — A isan inverse of a function 


f:A- Bif f(gQ)) =yand g(f(x)) =x forall x €¢ Aandy € B. 


If f has an inverse g, we often write f—! instead of g. 
Now we will determine when functions have inverses. A good 
candidate for an inverse of f: A > B is g: B > A “defined by” 


g(y) = “the” x such that f(x) = y. 


But the scare quotes around “defined by” (and “the”) suggest 
that this is not a definition. At least, it will not always work, with 
complete generality. For, in order for this definition to specify a 
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function, there has to be one and only one x such that f(x) = y— 
the output of g has to be uniquely specified. Moreover, it has to 
be specified for every y € B. If there are x; and x2 € A with 
x1 # xq but f(x1) = f(x), then g(y) would not be uniquely 
specified for y = f(x1) = f(x2). And if there is no x at all such 
that f(x) = y, then g(y) is not specified at all. In other words, 
for g to be defined, f must be both injective and surjective. 

Let’s go slowly. We'll divide the question into two: Given a 
function f: A — B, when is there a function g: B — A so that 
g(f(x)) = x? Such a g “undoes” what f does, and is called a left 
inverse of f. Secondly, when is there a function 4: B —> A so that 
f (AQ) =y? Such an A is called a right inverse of f—f “undoes” 
what fA does. 


Proof. Suppose that f: A — B is injective. Consider a y € B. 
If y € ran(f), there is an x € A so that f(x) = y. Because f 
is injective, there is only one such x € A. Then we can define: 
g(y) =, ie., g(y) is “the” x € A such that f(x) = y. Ify ¢ ran(/), 
we can map it to any a € A. So, we can pick an a € A and define 
g: B- Aby: 
oe ( eae 
a ify ¢ran(/). 

It is defined for all y € B, since for each such y € ran(/) there is 
exactly one x € A such that f(x) = y. By definition, if y = f(x), 


then g(y) =x, ie, g(f(x*)) =x. Oo 


Proof. Suppose that f: A — B is surjective. Consider a y € B. 
Since f is surjective, there is an xy € A with f(xy) = y. Then we 
can define: h(y) = xy, i.e., for each y € B we choose some x € A 
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so that f(x) = y; since f is surjective there is always at least one 
to choose from." By definition, if x = A(y), then f(x) = y, ie., for 
any y € B, f(A(y)) = 3. Oo 


By combining the ideas in the previous proof, we now get 
that every bijection has an inverse, i.e., there is a single function 
which is both a left and right inverse of /. 


Proof. Exercise. Oo 


There is a slightly more general way to extract inverses. We 
saw in section 3.2 that every function f induces a surjection 
f': A ran(f) by letting f’(x) = f(x) for all x € A. Clearly, 
if f is injective, then f’ is bijective, so that it has a unique in- 
verse by Proposition 3.18. By a very minor abuse of notation, we 
sometimes call the inverse of f’ simply “the inverse of f.” 


Proof. Exercise. Oo 


Since f is surjective, for every y € B the set {x : f(x) = y} is nonempty. 
Our definition of 4 requires that we choose a single x from each of these sets. 
That this is always possible is actually not obvious—the possibility of making 
these choices is simply assumed as an axiom. In other words, this proposition 
assumes the so-called Axiom of Choice, an issue we will gloss over. However, 
in many specific cases, e.g., when A = N or is finite, or when f is bijective, the 
Axiom of Choice is not required. (In the particular case when / is bijective, 
for each y € B the set {x : f(x) = y} has exactly one element, so that there is 
no choice to make.) 
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Proof. Suppose g and / are both inverses of f. Then in particular 
g isa left inverse of f and his a right inverse. By Proposition 3.19, 
gah. oO 


3-5 Composition of Functions 


We saw in section 3.4 that the inverse f~! of a bijection / is itself 
a function. Another operation on functions is composition: we 
can define a new function by composing two functions, f and g, 
i.e., by first applying f and then g. Of course, this is only possible 
if the ranges and domains match, i.e., the range of f must be a 
subset of the domain of g. This operation on functions is the 
analogue of the operation of relative product on relations from 
section 2.6. 

A diagram might help to explain the idea of composition. In 
Figure 3.5, we depict two functions f: A > Band g:B—>C 
and their composition (g o f). The function (go f): A ~ C 
pairs each element of A with an element of C’. We specify which 
element of C an element of A is paired with as follows: given an 
input x € A, first apply the function f to x, which will output 
some f(x) = y € B, then apply the function g to y, which will 
output some g(f(x)) = g(y) =z EC. 


Definition 3.21 (Composition). Let f: A— Band g: B>C 
be functions. The composition of f with g is go f: A — C, where 


(go f)(x) = g(f(*)). 


Example 3.22. Consider the functions f(x) = x«+1, and g(x) = 
2x. Since (g o f)(x) = g(f(x*)), for each input x you must first 
take its successor, then multiply the result by two. So their com- 
position is given by (g 0 f)(x) =2(x +1). 
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Figure 3.5: The composition g o f of two functions f and g. 


3.6 Partial Functions 


It is sometimes useful to relax the definition of function so that 
it is not required that the output of the function is defined for all 
possible inputs. Such mappings are called partial functions. 


Definition 3.23. A partial function f: A + B is a mapping 
which assigns to every element of A at most one element of B. If 
f assigns an element of B to x € A, we say f(x) is defined, and 


otherwise undefined. If f(x) is defined, we write f(x) |, other- 
wise f(x) 7. The domain of a partial function f is the subset of A 
where it is defined, ie., dom(f) = {x € A: f(x) |}. 


Example 3.24. Every function f: A — B is also a partial func- 
tion. Partial functions that are defined everywhere on A—i.e., 
what we so far have simply called a function—are also called 
total functions. 


Example 3.25. The partial function f: R + R given by f(x) = 
1/x is undefined for x = 0, and defined everywhere else. 
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Definition 3.26 (Graph of a partial function). Let f: A + B 
be a partial function. The graph of f is the relation Ry C Ax B 
defined by 


Rp = {(x,y) : f(x) = y}- 


Proof. Suppose there is a y such that Rxy. If there were another 
y’ # y such that Rxy’, the condition on R would be violated. 
Hence, if there is a y such that Rxy, that y is unique, and so / is 
well-defined. Obviously, Ry = R and f is total if R is serial. oO 


Summary 


A function f: A — B maps every element of the domain A to a 
unique element of the codomain B. If x € A, we call the y that f 
maps x to the value f(x) of f for argument x. If A is a set of 
pairs, we can think of the function f as taking two arguments. 
The range ran(/) of f is the subset of B that consists of all the 
values of /. 

If ran(f) = B then f is called surjective. The value f(x) is 
unique in that f maps x to only one f(x), never more than one. 
If f(x) is also unique in the sense that no two different arguments 
are mapped to the same value, / is called injective. Functions 
which are both injective and surjective are called bijective. 

Bijective functions have a unique inverse function f~}. Func- 
tions can also be chained together: the function (g o f) is the 
composition of f with g. Compositions of injective functions are 
injective, and of surjective functions are surjective, and (f~1o f) 
is the identity function. 
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If we relax the requirement that f must have a value for every 
x € A, we get the notion of a partial functions. If f: A + B 
is partial, we say f(x) is defined, f(x) | if f has a value for 
argument x, and otherwise we say that f(x) is undefined, f(x) 1. 
Any (partial) function f is associated with the graph Ry of f, 
the relation that holds iff f(x) = y. 


Problems 


Problem 3.1. Show that if f: A — B has a left inverse g, then 
f is injective. 


Problem 3.2. Show that if f: A — B has a right inverse A, then 
f is surjective. 


Problem 3.3. Prove Proposition 3.18. You have to define f~', 
show that it is a function, and show that it is an inverse of /, i.e., 


FU) =x and fU70)) = y for all x € Aand ye B. 


Problem 3.4. Prove Proposition 3.19. 


Problem 3.5. Show that if f: A — Band g: B — C are both 
injective, then go f: A — C is injective. 


Problem 3.6. Show that if f: A — Band g: B — C are both 
surjective, then go f: A — C is surjective. 


Problem 3.7. Suppose f: A — Band g: B — C. Show that the 
graph of go f is Ry | Rg. 


Problem 3.8. Given f: A + B, define the partial function 
g: B + A by: for any y ¢€ B, if there is a unique x € A such 
that f(x) = y, then g(y) = x; otherwise g(y) 7. Show that if f is 
injective, then g(f(x)) = x for all x ¢ dom(/), and f(g(y)) =y 
for all y € ran(f). 


CHAPTER 4 


The Size of Sets 


4.1 Introduction 


When Georg Cantor developed set theory in the 1870s, one of his 
aims was to make palatable the idea of an infinite collection—an 
actual infinity, as the medievals would say. A key part of this was 
his treatment of the size of different sets. If a, 6 and c are all 
distinct, then the set {a,b,c} is intuitively larger than {a,b}. But 
what about infinite sets? Are they all as large as each other? It 
turns out that they are not. 

The first important idea here is that of an enumeration. We 
can list every finite set by listing all its elements. For some infinite 
sets, we can also list all their elements if we allow the list itself 
to be infinite. Such sets are called countable. Cantor’s surprising 
result, which we will fully understand by the end of this chapter, 
was that some infinite sets are not countable. 


4.2 Enumerations and Countable Sets 


We’ve already given examples of sets by listing their elements. 
Let’s discuss in more general terms how and when we can list the 
elements of a set, even if that set is infinite. 


43 
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Definition 4.1 (Enumeration, informally). Informally, an 
enumeration of a set A is a list (possibly infinite) of elements 


of A such that every element of A appears on the list at some 
finite position. If A has an enumeration, then A is said to be 
countable. 


A couple of points about enumerations: 


1. We count as enumerations only lists which have a beginning 
and in which every element other than the first has a single 
element immediately preceding it. In other words, there 
are only finitely many elements between the first element 
of the list and any other element. In particular, this means 
that every element of an enumeration has a finite position: 
the first element has position 1, the second position 2, etc. 


2. We can have different enumerations of the same set A which 
differ by the order in which the elements appear: 4, 1, 25, 
16, 9 enumerates the (set of the) first five square numbers 
just as well as 1, 4, 9, 16, 25 does. 


3. Redundant enumerations are still enumerations: 1, 1, 2, 2, 
3, 3,... enumerates the same set as 1, 2, 3, ... does. 


4. Order and redundancy do matter when we specify an enu- 
meration: we can enumerate the positive integers beginning 
with 1, 2, 3,1, ..., but the pattern is easier to see when enu- 
merated in the standard way as 1, 2, 3, 4,... 


5. Enumerations must have a beginning: ..., 3, 2, 1 is not 
an enumeration of the positive integers because it has no 
first element. To see how this follows from the informal 
definition, ask yourself, “at what position in the list does 
the number 76 appear?” 


6. The following is not an enumeration of the positive inte- 
gers: 1, 3, 5,..., 2, 4, 6, ... The problem is that the even 
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numbers occur at places 00 + 1, 00 +2, co + 3, rather than at 
finite positions. 


7. The empty set is enumerable: it is enumerated by the empty 
list! 


Proof. Suppose A has an enumeration x1, x9, ... in which each 
x; is an element of A. We can remove repetitions from an enu- 


meration by removing repeated elements. For instance, we can 
turn the enumeration into a new one in which we list x, if it is 
an element of A that is not among xj, ..., x;-1 or remove x; from 
the list if it already appears among xj, ..., x;-1. Oo 


The last argument shows that in order to get a good handle 
on enumerations and countable sets and to prove things about 
them, we need a more precise definition. The following provides 
it. 


Definition 4.3 (Enumeration, formally). An enumeration of a 


set A # 0 is any surjective function f: Z* — A. 


Let’s convince ourselves that the formal definition and the 
informal definition using a possibly infinite list are equivalent. 
First, any surjective function from Z* to a set A enumerates A. 
Such a function determines an enumeration as defined informally 
above: the list f(1), f(2), f(3), .... Since f is surjective, every 
element of A is guaranteed to be the value of f(n) for some n € 
Z*. Hence, every element of A appears at some finite position in 
the list. Since the function may not be injective, the list may be 
redundant, but that is acceptable (as noted above). 

On the other hand, given a list that enumerates all elements 
of A, we can define a surjective function f: Z* — A by letting 
Jf (n) be the nth element of the list, or the final element of the 
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list if there is no mth element. The only case where this does not 
produce a surjective function is when A is empty, and hence the 
list is empty. So, every non-empty list determines a surjective 
function f: Z* > A. 


Example 4.5. A function enumerating the positive integers (Z*) 
is simply the identity function given by f(n) = n. A function 
enumerating the natural numbers N is the function g(n) = n—-1. 


Example 4.6. The functions f: Z* — Z* and g: Z* — Z* given 
by 

f (mn) = 2n and 

g(n) =2n-1 
enumerate the even positive integers and the odd positive inte- 


gers, respectively. However, neither function is an enumeration 
of Z*, since neither is surjective. 


Example 4.7. The function f(7) = ("S21 (where [x] de- 
notes the ceiling function, which rounds x up to the nearest in- 
teger) enumerates the set of integers Z. Notice how f generates 
the values of Z by “hopping” back and forth between positive and 
negative integers: 


fA) fQ2 FEB) FA fH) fF) FM 


You can also think of f as defined by cases as follows: 
0 ifa=1 
J (@) = n/2 if n is even 
—(n-1)/2 if nis odd and >1 
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Although it is perhaps more natural when listing the elements 
of a set to start counting from the 1st element, mathematicians 
like to use the natural numbers N for counting things. They talk 
about the Oth, 1st, 2nd, and so on, elements of a list. Correspond- 
ingly, we can define an enumeration as a surjective function from 
N to A. Of course, the two definitions are equivalent. 


Proof. Given a surjection f : Z* — A, we can define g(n) = f(n+ 
1) for all n € N. It is easy to see that g: N — A is surjective. 
Conversely, given a surjection g: N — A, define f(n) = g(n-1).0 


This gives us the following result: 


We discussed above than an list of elements of a set A can 
be turned into a list without repetitions. This is also true for 
enumerations, but a bit harder to formulate and prove rigorously. 
Any function f: Z* — A must be defined for all n € Z*. If there 
are only finitely many elements in A then we clearly cannot have 
a function defined on the infinitely many elements of Z* that 
takes as values all the elements of A but never takes the same 
value twice. In that case, i.e., in the case where the list without 
repetitions is finite, we must choose a different domain for f, one 
with only finitely many elements. Not having repetitions means 
that f must be injective. Since it is also surjective, we are looking 
for a bijection between some finite set {1,...,”} or Z* and A. 
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Proof. We define the function g recursively: Let g(1) = f(1). If 
g(i) has already been defined, let g(i+1) be the first value of f(1), 
f (2), ... not already among g(1), ..., g(a), if there is one. If A 
has just m elements, then g(1), ..., g(m) are all defined, and so 
we have defined a function g: {1,...,2} — A. If A has infinitely 
many elements, then for any 7 there must be an element of A 
in the enumeration f(1), f(2), ..., which is not already among 
g(1),..., g(4). In this case we have defined a funtion g: Z* — A. 

The function g is surjective, since any element of A is among 
f(), f(2), ... (since f is surjective) and so will eventually be 
a value of g(i) for some i. It is also injective, since if there were 
j <isuch that g(j) = g(i), then g(i) would already be among 
g(1), ..., g(i—1), contrary to how we defined g. oO 


Proof: A is countable iff A is empty or there is a surjective 
f: Z* — A. By Proposition 4.10, the latter holds iff there is 
a bijective function f: Z — A where Z = Z* or Z = {1,...,n} 
for some n € Z*. By the same argument as in the proof of Propo- 
sition 4.8, that in turn is the case iff there is a bijection g: N — A 
where either N = N or N = {0,...,n —1}. oO 


4-3 Cantor’s Zig-Zag Method 


We've already considered some “easy” enumerations. Now we 
will consider something a bit harder. Consider the set of pairs of 
natural numbers, which we defined in section 1.5 thus: 


Nx WN = {(n,m) : n,m € N} 


CHAPTER 4. THE SIZE OF SETS 49 


We can organize these ordered pairs into an array, like so: 


0 1 2 3 
(0,0) | (0,1) | (0,2) | (0,3) 
(1,0) | (1,1) | (1,2) | (1,3) 
(2,0) | (2,1) | (2,2) | (2,3) 
«8,0) | <8; 1)| 43,2)" 43,3) 


Wl Nh) | oO 


Clearly, every ordered pair in N x N will appear exactly once in 
the array. In particular, (n,m) will appear in the nth row and mth 
column. But how do we organize the elements of such an array 
into a “one-dimensional” list? The pattern in the array below 
demonstrates one way to do this (although of course there are 
many other options): 


0; 1;)2;|,3/]4 
0;0;1/] 3) 6 | 10 
1);2/)4) 7411 
2/5 | 8 | 12 
3} 9} 13 
4| 14 


This pattern is called Cantor's zig-zag method. It enumerates N x N 
as follows: 


{0,0),40,1), (1,,0),40, 2), (1,1), <2,0),40,3), (1, 2),¢2,1),43,0),.... 


And this establishes the following: 


Proof. Let f: N — N XN take each k € N to the tuple (n,m) € 
N XN such that & is the value of the nth row and mth column in 
Cantor’s zig-zag array. oO 
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This technique also generalises rather nicely. For example, 
we can use it to enumerate the set of ordered triples of natural 
numbers, i.e.: 


NxNxN = {(a,m,k) : n,m,k € N} 


We think of N x N x N as the Cartesian product of N x N with N, 
that is, 


N° = (NX N) x N= {((n,m),k) : n,m,k € N} 


and thus we can enumerate N® with an array by labelling one axis 
with the enumeration of N, and the other axis with the enumer- 
ation of N?: 


0 1 2 3 
(0,0) | (0,0,0) | (0,0,1) | (0,0,2) | (0,0,3) 
(0,1) | (0,1,0) | (0,1,1) | (0,1,2) | (0,1,3) 
(1,0) | (1,0,0) | (1,0,1) | (1,0,2) | (2,0,3) 
(0,2) | (0,2,0) | (0,21) | (0,2,2) | (0,2,3) 


Thus, by using a method like Cantor’s zig-zag method, we may 
similarly obtain an enumeration of N°. And we can keep going, 
obtaining enumerations of N” for any natural number n. So, we 
have: 


4.4. Pairing Functions and Codes 


Cantor’s zig-zag method makes the enumerability of N” visually 
evident. But let us focus on our array depicting N’. Following the 
zig-zag line in the array and counting the places, we can check 
that (1,2) is associated with the number 7. However, it would 
be nice if we could compute this more directly. That is, it would 
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be nice to have to hand the inverse of the zig-zag enumeration, 
g: N? SN, such that 


&((0,0)) = 0, g({0,1)) = 1, g((1,0)) = 2, ...,g((1,2)) =7, ... 


This would enable us to calculate exactly where (n,m) will occur 
in our enumeration. 

In fact, we can define g directly by making two observations. 
First: if the nth row and mth column contains value v, then the 
(n+1)st row and (m-—1)st column contains value v+1. Second: the 
first row of our enumeration consists of the triangular numbers, 
starting with 0, 1, 3, 6, etc. The éth triangular number is the sum 
of the natural numbers < k, which can be computed as k(k+1)/2. 
Putting these two observations together, consider this function: 


(n+m+1)(n+m) 
2 


g(n,m) = 


We often just write g(n,m) rather that g((n,m)), since it is easier 
on the eyes. This tells you first to determine the (n+m)" triangle 
number, and then add 2 to it. And it populates the array in 
exactly the way we would like. So in particular, the pair (1,2) is 
sent to 34157. 

This function g is the inverse of an enumeration of a set of 
pairs. Such functions are called pairing functions. 


Definition 4.14 (Pairing function). A function f: Ax B > N 


is an arithmetical pairing function if f is injective. We also say 
that f encodes A x B, and that f(x,y) is the code for (x,y). 


We can use pairing functions to encode, e.g., pairs of natu- 
ral numbers; or, in other words, we can represent each pair of 
elements using a single number. Using the inverse of the pairing 
function, we can decode the number, i.e., find out which pair it 
represents. 
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4.5 An Alternative Pairing Function 


There are other enumerations of N? that make it easier to figure 
out what their inverses are. Here is one. Instead of visualiz- 
ing the enumeration in an array, start with the list of positive 
integers associated with (initially) empty spaces. Imagine filling 
these spaces successively with pairs (n,m) as follows. Starting 
with the pairs that have 0 in the first place (i.e., pairs (0,m)), 
put the first (i.e., (0,0)) in the first empty place, then skip an 
empty space, put the second (i.e., (0,2)) in the next empty place, 
skip one again, and so forth. The (incomplete) beginning of our 
enumeration now looks like this 


1 2 3 4 5 6 7 8 9 10 
(0,1) (0, 2) (0,3) (0,4) (0, 5) 


Repeat this with pairs (1, m) for the place that still remain empty, 
again skipping every other empty place: 


1 2 3 4 5 6 7 8 9 10 
(0,0) (1,0) (0,1) (0,2) (1,1) (0,3) (0,4) (1,2) 


Enter pairs (2,m), (2,m), etc., in the same way. Our completed 
enumeration thus starts like this: 


1 2 3 4 5 6 7 8 9 10 
(0,0) <1,0) (0,1) (2,0) (0,2) (1,1) (0,3) (3,0) (0,4) (1,2) 


If we number the cells in the array above according to this enu- 
meration, we will not find a neat zig-zag line, but this arrange- 
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ment: 
0/1 2 3 4) 5 
0O;1;3 }5)7)9 4) 11 
1/ 2 6 | 10 | 14 | 18 
2| 4 |} 12) 20 | 28 
3/ 8 | 24 | 40 
4|16| 48 
5 | 32 


We can see that the pairs in row 0 are in the odd numbered places 
of our enumeration, i.e., pair (0,m) is in place 2m + 1; pairs in 
the second row, (1,m), are in places whose number is the double 
of an odd number, specifically, 2-(2m+1); pairs in the third row, 
(2,m), are in places whose number is four times an odd number, 
4-(2m+1); and so on. The factors of (2m+1) for each row, 1, 2, 4, 
8, ..., are exactly the powers of 2: 1 = 99, 2=21,4=27,8=23, 
... In fact, the relevant exponent is always the first member of 
the pair in question. Thus, for pair (n,m) the factor is 2”. This 
gives us the general formula: 2” - (2m +1). However, this is a 
mapping of pairs to positive integers, i.e., (0,0) has position 1. If 
we want to begin at position 0 we must subtract 1 from the result. 


This gives us: 
Example 4.15. The function 4: N? > N given by 
h(n,m) = 2”(2m+1)-1 
is a pairing function for the set of pairs of natural numbers N?. 


Accordingly, in our second enumeration of N?, the pair (0,0) 
has code A(0,0) = 2°(2:0+1) —1 = 0; (1,2) has code 2! - (2-24 
1) -1=2-5-1=9; (2,6) has code 2? - (2-6+1) -1=51. 

Sometimes it is enough to encode pairs of natural numbers N? 
without requiring that the encoding is surjective. Such encodings 
have inverses that are only partial functions. 
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Example 4.16. The function j: N? > N* given by 
j(n,m) = 273” 


is an injective function N? > N. 


4.6 Uncountable Sets 


Some sets, such as the set Z* of positive integers, are infinite. 
So far we’ve seen examples of infinite sets which were all count- 
able. However, there are also infinite sets which do not have this 
property. Such sets are called uncountable. 

First of all, it is perhaps already surprising that there are un- 
countable sets. For any countable set A there is a surjective func- 
tion f: Z* — A. Ifa set is uncountable there is no such function. 
That is, no function mapping the infinitely many elements of Z* 
to A can exhaust all of A. So there are “more” elements of A than 
the infinitely many positive integers. 

How would one prove that a set is uncountable? You have to 
show that no such surjective function can exist. Equivalently, you 
have to show that the elements of A cannot be enumerated in a 
one way infinite list. The best way to do this is to show that every 
list of elements of A must leave at least one element out; or that 
no function f: Z* — A can be surjective. We can do this using 
Cantor’s diagonal method. Given a list of elements of A, say, x1, x9, 
..., we construct another element of A which, by its construction, 
cannot possibly be on that list. 

Our first example is the set B® of all infinite, non-gappy se- 
quences of 0’s and 1’s. 


Proof: Suppose, by way of contradiction, that B® is countable, 
i.e., suppose that there is a list 51, 59, 53, 54, ... of all elements 
of B®. Each of these 5; is itself an infinite sequence of 0’s and 1’s. 
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Let’s call the j-th element of the i-th sequence in this list 5;(/). 
Then the i-th sequence 5; is 


5; (1), 5;(2), 5;(3), acs 


We may arrange this list, and the elements of each sequence 
5; in it, in an array: 


1 2 3 4 
si(1) | s1(2) | (3) | (4) 
s2(1) | 82(2) | 2(3) | 52(4) 
531) | s3(2) | s3(3) | 53(4) 
s4(1) | s4(2) | 54(3) | s4(4) 


|! oo} po] 


The labels down the side give the number of the sequence in the 
list 51, 59, ...; the numbers across the top label the elements of the 
individual sequences. For instance, 5;(1) is a name for whatever 
number, a 0 or a 1, is the first element in the sequence s), and so 
on. 

Now we construct an infinite sequence, s, of 0’s and 1’s which 
cannot possibly be on this list. The definition of 5 will depend on 
the list 5], s9,.... Any infinite list of infinite sequences of 0’s and 
1’s gives rise to an infinite sequence 5 which is guaranteed to not 
appear on the list. 

To define s, we specify what all its elements are, i.e., we spec- 
ify s(n) for all n € Z*. We do this by reading down the diagonal 
of the array above (hence the name “diagonal method”) and then 
changing every 1 to a 0 and every 0 to a 1. More abstractly, we 
define s(n) to be 0 or 1 according to whether the n-th element of 
the diagonal, s,(m), is 1 or 0. 


=O = 1 ifs,(n)=0 
kame ae ee ee 


If you like formulas better than definitions by cases, you could 
also define 5() = 1—- 5,(n). 
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Clearly 5 is an infinite sequence of 0’s and 1’s, since it is just 
the mirror sequence to the sequence of 0’s and 1’s that appear on 
the diagonal of our array. So 5 is an element of B®. But it cannot 
be on the list 51, 52, ... Why not? 

It can’t be the first sequence in the list, s;, because it differs 
from s; in the first element. Whatever s;(1) is, we defined s5(1) 
to be the opposite. It can’t be the second sequence in the list, 
because Ss differs from sg in the second element: if s9(2) is 0, 5(2) 
is 1, and vice versa. And so on. 

More precisely: if 5 were on the list, there would be some k 
so that s = sz. Two sequences are identical iff they agree at every 
place, i.e., for any n, s(n) = s,(n). So in particular, taking n = k 
as a special case, s(k) = s,(k) would have to hold. s;(k) is either 
0 or 1. If it is 0 then 5(4) must be 1—that’s how we defined s. But 
if s,(k) = 1 then, again because of the way we defined s, s(k) = 0. 
In either case s(k) # 5; (4). 

We started by assuming that there is a list of elements of B®, 
51, 59, .... From this list we constructed a sequence s which we 
proved cannot be on the list. But it definitely is a sequence of 
0’s and 1’s if all the s; are sequences of 0’s and 1’s, ie., 5 € BY. 
This shows in particular that there can be no list of all elements 
of B®, since for any such list we could also construct a sequence s 
guaranteed to not be on the list, so the assumption that there is 
a list of all sequences in B® leads to a contradiction. oO 


This proof method is called “diagonalization” because it uses 
the diagonal of the array to define s. Diagonalization need not 
involve the presence of an array: we can show that sets are not 
countable by using a similar idea even when no array and no 
actual diagonal is involved. 


Proof: We proceed in the same way, by showing that for every list 
of subsets of Z* there is a subset of Z* which cannot be on the 
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list. Suppose the following is a given list of subsets of Z*: 
11,22, Z3,... 
We now define a set Z such that for any n € Z*, n € Z iff n ¢ Zy: 
Z={neZ :n¢ Zn} Oo 


Z is clearly a set of positive integers, since by assumption each Z, 
is, and thus Z € g(Z*). But Z cannot be on the list. To show 
this, we'll establish that for each k € Z*, Z # Zp. 

So let k € Z* be arbitrary. We’ve defined Z so that for any 
née Zt,n€ Z iff n ¢ Z,. In particular, taking n = k, k € Z 
iff k ¢ Z;,. But this shows that Z # Z;, since k is an element of 
one but not the other, and so Z and Z; have different elements. 


Since k was arbitrary, Z is not on the list Z;, Zo, ... 


The preceding proof did not mention a diagonal, but you 
can think of it as involving a diagonal if you picture it this way: 
Imagine the sets Z,, Zo, ..., written in an array, where each ele- 
ment j € Z; is listed in the j-th column. Say the first four sets on 
that list are {1,2,3,...}, {2,4,6,...}, {1,2,5}, and {3,4,5,...}. 
Then the array would begin with 


A=(1, 2, 3, 4, 5, 6 ...} 
ZD={ 2, 4, 6, ...} 
Z3={1, 2, 5 } 

sat 


Z4 ={ 3, 4, 5, 6, 


Then Z is the set obtained by going down the diagonal, leav- 
ing out any numbers that appear along the diagonal and include 
those j where the array has a gap in the j-th row/column. In the 
above case, we would leave out 1 and 2, include 3, leave out 4, 
etc. 
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4-7 Reduction 


We showed 9(Z*) to be uncountable by a diagonalization argu- 
ment. We already had a proof that B®, the set of all infinite 
sequences of Os and 1s, is uncountable. Here’s another way we 
can prove that g(Z*) is uncountable: Show that ifg(Z*) is count- 
able then B® is also countable. Since we know B” is not countable, 
g(Z*) can’t be either. This is called reducing one problem to 
another—in this case, we reduce the problem of enumerating B® 
to the problem of enumerating g(Z*). A solution to the latter—an 
enumeration of g(Z*)—would yield a solution to the former—an 
enumeration of BY. 

How do we reduce the problem of enumerating a set B to 
that of enumerating a set A? We provide a way of turning an 
enumeration of A into an enumeration of B. The easiest way to 
do that is to define a surjective function f: A > B. If x1, x9, ... 
enumerates A, then f(x1), f(x2),... would enumerate B. In our 
case, we are looking for a surjective function f: g(Z*) > BY. 


Proof of Theorem 4.18 by reduction. Suppose that g(Z*) were 
countable, and thus that there is an enumeration of it, Z}, 
Zo, 23, ..- 

Define the function f: g(Z*) — B® by letting f(Z) be the 
sequence s, such that sz,(n) = 1 iff n € Z, and s;,(n) = 0 other 
wise. This clearly defines a function, since whenever Z C Z*, any 
n € Zt either is an element of Z or isn’t. For instance, the set 
2Z* = {2,4,6,...} of positive even numbers gets mapped to the 
sequence 010101..., the empty set gets mapped to 0000... and 
the set Z* itself to 1111.... 

It also is surjective: Every sequence of 0s and 1s corresponds 
to some set of positive integers, namely the one which has as its 
members those integers corresponding to the places where the 
sequence has 1s. More precisely, suppose s € B®. Define Z C Z* 
by: 

Z={neZ +s(n)=1} 
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Then f(Z) = s, as can be verified by consulting the definition 
of f. 


Now consider the list 


f(A), f (Za), f (Z3),--- 


Since f is surjective, every member of B® must appear as a value 
of f for some argument, and so must appear on the list. This list 
must therefore enumerate all of BY. 

So if 9(Z*) were countable, B® would be countable. But BY 
is uncountable (Theorem 4.17). Hence g(Z*) is uncountable. Oo 


It is easy to be confused about the direction the reduction 
goes in. For instance, a surjective function g: B? — B does not 
establish that B is uncountable. (Consider g: BY — B defined 
by g(s) = s(1), the function that maps a sequence of 0’s and 1’s 
to its first element. It is surjective, because some sequences start 
with 0 and some start with 1. But B is finite.) Note also that the 
function f must be surjective, or otherwise the argument does 
not go through: f(«1), f(%2), ... would then not be guaranteed 
to include all the elements of B. For instance, 


h(n) =000...0 
—>S_SE—_ 
n 0’s 


defines a function h: Z* — B”, but Z* is countable. 


4.8 Equinumerosity 


We have an intuitive notion of “size” of sets, which works fine for 
finite sets. But what about infinite sets? If we want to come up 
with a formal way of comparing the sizes of two sets of any size, 
it is a good idea to start by defining when sets are the same size. 
Here is Frege: 


If a waiter wants to be sure that he has laid exactly as 
many knives as plates on the table, he does not need 
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to count either of them, if he simply lays a knife to the 
right of each plate, so that every knife on the table lies 
to the right of some plate. The plates and knives are 
thus uniquely correlated to each other, and indeed 
through that same spatial relationship. (Frege, 1884, 
§70) 


The insight of this passage can be brought out through a formal 
definition: 


Definition 4.19. A is equinumerous with B, written A ~ B, iff 


there is a bijection f: A — B. 


Proof. We must show that equinumerosity is reflexive, symmetric, 
and transitive. Let A,B, and C be sets. 

Reflexivity. The identity map Id4: A — A, where Idy4(x) = x 
for all x € A, is a bijection. So A A. 

Symmetry. Suppose A ~ B, i.e., there is a bijection f: A > B. 
Since f is bijective, its inverse f~! exists and is also bijective. 
Hence, f~': B > A is a bijection, so B = A. 

Transitivity. Suppose that A ~ B and B ~ C, ie., there are 
bijections f: A — Band g: B — C. Then the composition 
gof:A— C is bijective, so that A ~ C. Oo 


Proof. Suppose A ~ B, so there is some bijection f: A — B, 
and suppose that A is countable. Then either A = @ or there 
is a surjective function g: Z* — A. If A = Q, then B = @ also 
(otherwise there would be an element y € B but no x € A with 
g(x) =). If, on the other hand, g: Z* — A is surjective, then 
fog:Z — B is surjective. To see this, let y ¢ B. Since f 
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is surjective, there is an x € A such that f(x) = y. Since g is 
surjective, there is an m € Z* such that g(n) = x. Hence, 


(f o g)(n) = f(g(n)) = f(*) = 


and thus f o g is surjective. We have that f o g is an enumeration 
of B, and so B is countable. 

If B is countable, we obtain that A is countable by repeating 
the argument with the bijection f~!: B > A instead of f. Oo 


4.9 Sets of Different Sizes, and Cantor’s 
Theorem 


We have offered a precise statement of the idea that two sets have 
the same size. We can also offer a precise statement of the idea 
that one set is smaller than another. Our definition of “is smaller 
than (or equinumerous)” will require, instead of a bijection be- 
tween the sets, an injection from the first set to the second. If 
such a function exists, the size of the first set is less than or equal 
to the size of the second. Intuitively, an injection from one set 
to another guarantees that the range of the function has at least 
as many elements as the domain, since no two elements of the 
domain map to the same element of the range. 


Definition 4.22. A is no larger than B, written A < B, iff there is 


an injection f: A > B. 


It is clear that this is a reflexive and transitive relation, but 
that it is not symmetric (this is left as an exercise). We can also 
introduce a notion, which states that one set is (strictly) smaller 
than another. 


Definition 4.23. A is smaller than B, written A < B, iff there is 


an injection f: A — B but no bijection g: A — B,ie., A < B 
and A # B. 
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It is clear that this relation is irreflexive and transitive. (This 
is left as an exercise.) Using this notation, we can say that a set 
A is countable iff A < N, and that A is uncountable iff N < A. 
This allows us to restate Theorem 4.18 as the observation that 
Z* < 9(Z*). In fact, Cantor (1892) proved that this last point is 
perfectly general: 


Proof. The map f(x) = {x} is an injection f: A — (A), since if 
x # y, then also {x} # {y} by extensionality, and so f(x) # f(y). 
So we have that A < (A). 

We will now show that there cannot be a surjective func- 
tion g: A — (A), let alone a bijective one, and hence that 
A # (A). For suppose that g: A — (A). Since g is total, 
every x € Ais mapped to a subset g(x) C A. We can show that g 
cannot be surjective. To do this, we define a subset A C A which 
by definition cannot be in the range of g. Let 


A={xeA:x¢ g(x}. 


Since g(x) is defined for all x € A, A is clearly a well-defined 
subset of A. But, it cannot be in the range of g. Let x € A be 
arbitrary, we will show that A# g(x). If x € g(x), then it does 
not satisfy x ¢ g(x), and so by the definition of A, we have x ¢ A. 
If x € A, it must satisfy the defining property of A, ie., x € A 
and x ¢ g(x). Since x was arbitrary, this shows that for each 
xEA XE g(x) iff x ¢ A, and so g(x) # A. In other words, A 
cannot be in the range of g, contradicting the assumption that g 
is surjective. Oo 


It’s instructive to compare the proof of Theorem 4.24 to that 
of Theorem 4.18. There we showed that for any list Z), Zo, ..., of 
subsets of Z* one can construct a set Z of numbers guaranteed 
not to be on the list. It was guaranteed not to be on the list 
because, for every n € Z*, n € Z, iff n ¢ Z. This way, there is 
always some number that is an element of one of Z, or Z but not 
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the other. We follow the same idea here, except the indices n are 
now elements of A instead of Z*. The set B is defined so that it 
is different from g(x) for each x € A, because x € g(x) iff x ¢ B. 
Again, there is always an element of A which is an element of one 
of g(x) and B but not the other. And just as Z therefore cannot 
be on the list Z, Zo, ..., B cannot be in the range of g. 

The proof is also worth comparing with the proof of Russell’s 
Paradox, Theorem 1.29. Indeed, Cantor’s Theorem was the in- 
spiration for Russell’s own paradox. 


4.10 The Notion of Size, and 
Schréder-Bernstein 


Here is an intuitive thought: if A is no larger than B and B is no 
larger than A, then A and B are equinumerous. To be honest, if 
this thought were wrong, then we could scarcely justify the thought 
that our defined notion of equinumerosity has anything to do 
with comparisons of “sizes” between sets! Fortunately, though, 
the intuitive thought is correct. This is justified by the Schréder- 
Bernstein Theorem. 


In other words, if there is an injection from A to B, and an in- 
jection from B to A, then there is a bijection from A to B. 

This result, however, is really rather difficult to prove. Indeed, 
although Cantor stated the result, others proved it." For now, you 
can (and must) take it on trust. 

Fortunately, Schréder-Bernstein is correct, and it vindicates our 
thinking of the relations we defined, i.e., A ~ Band A < B, as hav- 
ing something to do with “size”. Moreover, Schréder-Bernstein is 
very useful. It can be difficult to think of a bijection between two 
equinumerous sets. The Schréder-Bernstein Theorem allows us 


+For more on the history, see e.g., Potter (2004, pp. 165-6). 
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to break the comparison down into cases so we only have to think 
of an injection from the first to the second, and vice-versa. 


Summary 


The size of a set A can be measured by a natural number if the set 
is finite, and sizes can be compared by comparing these numbers. 
If sets are infinite, things are more complicated. The first level of 
infinity is that of countably infinite sets. A set A is countable 
if its elements can be arranged in an enumeration, a one-way 
infinite list, i.e., when there is a surjective function f: Z* — A. It 
is countably infinite if it is countable but not finite. Cantor’s zig- 
zag method shows that the sets of pairs of elements of countably 
infinite sets is also countable; and this can be used to show that 
even the set of rational numbers Q is countable. 

There are, however, infinite sets that are not countable: these 
sets are called uncountable. There are two ways of showing that 
a set is uncountable: directly, using a diagonal argument, or 
by reduction. To give a diagonal argument, we assume that the 
set A in question is countable, and use a hypothetical enumera- 
tion to define an element of A which, by the very way we define 
it, is guaranteed to be different from every element in the enu- 
meration. So the enumeration can’t be an enumeration of all 
of A after all, and we’ve shown that no enumeration of A can 
exist. A reduction shows that A is uncountable by associating 
every element of A with an element of some known uncountable 
set B in a surjective way. If this is possible, than a hypothetical 
enumeration of A would yield an enumeration of B. Since B is 
uncountable, no enumeration of A can exist. 

In general, infinite sets can be compared sizewise: A and 
B are the same size, or equinumerous, if there is a bijection 
between them. We can also define that A is no larger than B 
(A < B) if there is an injective function from A to B. By the 
Schréder-Bernstein Theorem, this in fact provides a sizewise or- 
der of infinite sets. Finally, Cantor’s theorem says that for any 
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A, A < g(A). This is a generalization of our result that g(Z*) is 
uncountable, and shows that there are not just two, but infinitely 
many levels of infinity. 


Problems 


Problem 4.1. Define an enumeration of the positive squares 1, 
4, 9, 16,... 


Problem 4.2. Show that if A and B are countable, so is A U B. 
To do this, suppose there are surjective functions f: Z* — A 
and g: Z* — B, and define a surjective function h: Z*> — AUB 
and prove that it is surjective. Also consider the cases where A 
or B=9. 


Problem 4.3. Show that if B C A and A is countable, so is B. To 
do this, suppose there is a surjective function f: Z* — A. Define 
a surjective function g: Z* — B and prove that it is surjective. 


What happens if B = 0? 


Problem 4.4. Show by induction on 2 that if Aj, Ag, ..., An are 
all countable, so is Aj U---UA,. You may assume the fact that if 
two sets A and B are countable, so is AU B. 


Problem 4.5. According to Definition 4.4, a set A is enumerable 
iff A = 0 or there is a surjective f: Z* — A. It is also possible to 
define “countable set” precisely by: a set is enumerable iff there 
is an injective function g: A — Z*. Show that the definitions are 
equivalent, i.e., show that there is an injective function g: A — Z* 
iff either A = 0 or there is a surjective f: Z* — A. 


Problem 4.6. Show that (Z*)” is countable, for every n € N. 


Problem 4.7. Show that (Z*)* is countable. You may assume 
problem 4.6. 
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Problem 4.8. Give an enumeration of the set of all non-negative 
rational numbers. 


Problem 4.9. Show that Q is countable. Recall that any rational 
number can be written as a fraction z/m with z € Z, m € N*. 


Problem 4.10. Define an enumeration of B*. 


Problem 4.11. Recall from your introductory logic course that 
each possible truth table expresses a truth function. In other 
words, the truth functions are all functions from B* — B for 
some k. Prove that the set of all truth functions is enumerable. 


Problem 4.12. Show that the set of all finite subsets of an arbi- 
trary infinite countable set is countable. 


Problem 4.13. A subset of N is said to be cofinite iff it is the 
complement of a finite set N; that is, A C N is cofinite iff N \ A is 
finite. Let J be the set whose elements are exactly the finite and 
cofinite subsets of N. Show that J is countable. 


Problem 4.14. Show that the countable union of countable sets 
is countable. That is, whenever Aj, Ag,... are sets, and each 4; is 
countable, then the union )°, A; of all of them is also countable. 
[NB: this is hard!] 


Problem 4.15. Let f: A x B > N be an arbitrary pairing func- 
tion. Show that the inverse of f is an enumeration of A x B. 


Problem 4.16. Specify a function that encodes N’. 


Problem 4.17. Show that g(N) is uncountable by a diagonal ar- 
gument. 


Problem 4.18. Show that the set of functions f: Z* — Z* is 
uncountable by an explicit diagonal argument. That is, show 
that if fi, fo, ..., is a list of functions and each f;: Z* — Z*, then 
there is some f: Z* — Z* not on this list. 
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Problem 4.19. Show that if there is an injective function g: B > 
A, and B is uncountable, then so is A. Do this by showing how 
you can use g to turn an enumeration of A into one of B. 


Problem 4.20. Show that the set of all sets of pairs of positive 
integers is uncountable by a reduction argument. 


Problem 4.21. Show that the set X of all functions f: N — N 
is uncountable by a reduction argument (Hint: give a surjective 
function from X to B”.) 


Problem 4.22. Show that N®, the set of infinite sequences of nat- 
ural numbers, is uncountable by a reduction argument. 


Problem 4.23. Let P be the set of functions from the set of posi- 
tive integers to the set {0}, and let Q be the set of partial functions 
from the set of positive integers to the set {0}. Show that P is 
countable and Q is not. (Hint: reduce the problem of enumerat- 
ing B® to enumerating Q). 


Problem 4.24. Let S be the set of all surjective functions from 
the set of positive integers to the set {0,1}, ie., S consists of all 
surjective f: Z* — B. Show that S is uncountable. 


Problem 4.25. Show that the set R of all real numbers is un- 
countable. 


Problem 4.26. Show that if A ~ C and B = D, and AN B= 
CoOD=0,then AUB CUD. 


Problem 4.27. Show that if A is infinite and countable, then A ~ 
N. 


Problem 4.28. Show that there cannot be an_ injection 
g: 9(A) — A, for any set A. Hint: Suppose g: g(A) — A 
is injective. Consider D = {g(B) : B C Aand g(B) ¢ B}. Let 
x = g(D). Use the fact that g is injective to derive a contradiction. 


PART Il 


First-order 
Logic 


Introduction to 
First-Order 
Logic 


5.1 First-Order Logic 


You are probably familiar with first-order logic from your first in- 
troduction to formal logic." You may know it as “quantificational 
logic” or “predicate logic.” First-order logic, first of all, is a for- 
mal language. That means, it has a certain vocabulary, and its 
expressions are strings from this vocabulary. But not every string 
is permitted. There are different kinds of permitted expressions: 
terms, formulas, and sentences. We are mainly interested in sen- 
tences of first-order logic: they provide us with a formal analogue 
of sentences of English, and about them we can ask the questions 
a logician typically is interested in. For instance: 


* Does B follow from A logically? 


* Is A logically true, logically false, or contingent? 


In fact, we more or less assume you are! If you’re not, you could review a 
more elementary textbook, such as forall x (Magnus et al., 2021). 


70 
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e Are A and B equivalent? 


These questions are primarily questions about the “meaning” 
of sentences of first-order logic. For instance, a philosopher would 
analyze the question of whether B follows logically from A as ask- 
ing: is there a case where A is true but B is false (B doesn’t follow 
from A), or does every case that makes A true also make B true (B 
does follow from A)? But we haven’t been told yet what a “case” 
is—that is the job of semantics. The semantics of first-order logic 
provides a mathematically precise model of the philosopher’s in- 
tuitive idea of “case,” and also—and this is important—of what 
it is for a sentence A to be true in a case. We call the mathemati- 
cally precise model that we will develop a structure. The relation 
which makes “true in” precise, is called the relation of satisfac- 
tion. So what we will define is “A is satisfied in M” (in symbols: 
M ¢ A) for sentences A and structures M. Once this is done, 
we can also give precise definitions of the other semantical terms 
such as “follows from” or “is logically true.” These definitions 
will make it possible to settle, again with mathematical precision, 
whether, e.g., Vx (A(x) > B(x)),dx A(x) & Sx B(x). The answer 
will, of course, be “yes.” If you’ve already been trained to sym- 
bolize sentences of English in first-order logic, you will recognize 
this as, e.g., the symbolizations of, say, “All ants are insects, there 
are ants, therefore there are insects.” That is obviously a valid 
argument, and so our mathematical model of “follows from” for 
our formal language should give the same answer. 

Another topic you probably remember from your first intro- 
duction to formal logic is that there are derivations. If you have 
taken a first formal logic course, your instructor will have made 
you practice finding such derivations, perhaps even a derivation 
that shows that the above entailment holds. There are many dif- 
ferent ways to give derivations: you may have done something 
called “natural deduction” or “truth trees,” but there are many 
others. The purpose of derivation systems is to provide tools us- 
ing which the logicians’ questions above can be answered: e.g., 
a natural deduction derivation in which Vx (A(x) — B(x)) and 
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dx A(x) are premises and 4x B(x) is the conclusion (last line) 
verifies that 4x B(x) logically follows from Vx (A(x) > B(x)) and 
Ax A(x). 

But why is that? On the face of it, derivation systems have 
nothing to do with semantics: giving a formal derivation merely 
involves arranging symbols in certain rule-governed ways; they 
don’t mention “cases” or “true in” at all. The connection between 
derivation systems and semantics has to be established by a meta- 
logical investigation. What’s needed is a mathematical proof, e.g., 
that a formal derivation of 4x B(x) from premises Vx (A(x) > 
B(x)) and 4x A(x) is possible, if, and only if, Vx (A(x) — B(x)) 
and 4x A(x) together entail 4x B(x). Before this can be done, 
however, a lot of painstaking work has to be carried out to get 
the definitions of syntax and semantics correct. 


5.2 Syntax 


We first must make precise what strings of symbols count as 
sentences of first-order logic. We'll do this later; for now 
we'll just proceed by example. The basic building blocks—the 
vocabulary—of first-order logic divides into two parts. The first 
part is the symbols we use to say specific things or to pick out spe- 
cific things. We pick out things using constant symbols, and we 
say stuff about the things we pick out using predicate symbols. 
E.g, we might use a as a constant symbol to pick out a single 
thing, and then say something about it using the sentence P(a). 
If you have meanings for “a” and “P” in mind, you can read P(a) 
as a sentence of English (and you probably have done so when 
you first learned formal logic). Once you have such simple sen- 
tences of first-order logic, you can build more complex ones using 
the second part of the vocabulary: the logical symbols (connec- 
tives and quantifiers). So, for instance, we can form expressions 
like (P(a) A Q(b)) or Ax P(x). 

In order to provide the precise definitions of semantics and 
the rules of our derivation systems required for rigorous meta- 
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logical study, we first of all have to give a precise definition of 
what counts as a sentence of first-order logic. The basic idea 
is easy enough to understand: there are some simple sentences 
we can form from just predicate symbols and constant symbols, 
such as P(a). And then from these we form more complex ones 
using the connectives and quantifiers. But what exactly are the 
rules by which we are allowed to form more complex sentences? 
These must be specified, otherwise we have not defined “sentence 
of first-order logic” precisely enough. There are a few issues. 
The first one is to get the right strings to count as sentences. 
The second one is to do this in such a way that we can give 
mathematical proofs about all sentences. Finally, we'll have to 
also give precise definitions of some rudimentary operations with 
sentences, such as “replace every x in A by b.” The trouble is that 
the quantifiers and variables we have in first-order logic make 
it not entirely obvious how this should be done. E.g., should 
Ax P(a) count as a sentence? What about 4x 4x P(x)? What 
should the result of “replace x by b in (P(x) A Ax P(x))” be? 


5-3 Formulas 


Here is the approach we will use to rigorously specify sentences 
of first-order logic and to deal with the issues arising from the use 
of variables. We first define a different set of expressions: formu- 
las. Once we’ve done that, we can consider the role variables play 
in them—and on the basis of some other ideas, namely those of 
“free” and “bound” variables, we can define what a sentence is 
(namely, a formula without free variables). We do this not just be- 
cause it makes the definition of “sentence” more manageable, but 
also because it will be crucial to the way we define the semantic 
notion of satisfaction. 

Let’s define “formula” for a simple first-order language, one 
containing only a single predicate symbol P and a single con- 
stant symbol a, and only the logical symbols =, A, and 3. Our 
full definitions will be much more general: we’ll allow infinitely 
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many predicate symbols and constant symbols. In fact, we will 
also consider function symbols which can be combined with con- 
stant symbols and variables to form “terms.” For now, a and 
the variables will be our only terms. We do need infinitely many 
variables. We'll officially use the symbols vo, 1, ..., as variables. 


Definition 5.1. The set of formulas Frm is defined as follows: 


. P(a) and P(v,;) are formulas (i € N). 

. If Ais a formula, then —A is formula. 

. If A and B are formulas, then (A A B) is a formula. 

. If Ais a formula and x is a variable, then 4x A is a formula. 


. Nothing else is a formula. 


(1) tells us that P(a) and P(v;) are formulas, for any i € N. 
These are the so-called atomic formulas. They give us something 
to start from. The other clauses give us ways of forming new for- 
mulas from ones we have already formed. So for instance, by (2), 
we get that ~P(v2) is a formula, since P(v2) is already a formula 
by (1). Then, by (4), we get that Sv2 -P( v2) is another formula, 
and so on. (5) tells us that only strings we can form in this way 
count as formulas. In particular, dv P(a) and Avy dv P(a) do 
count as formulas, and (=P(a)) does not, because of the extra- 
neous outer parentheses. 

This way of defining formulas is called an inductive definition, 
and it allows us to prove things about formulas using a version of 
proof by induction called structural induction. These are discussed 
in a general way in appendix B.4 and appendix B.5, which you 
should review before delving into the proofs later on. Basically, 
the idea is that if you want to give a proof that something is 
true for all formulas, you show first that it is true for the atomic 
formulas, and then that if it’s true for any formula A (and B), 
it’s also true for =A, (A A B), and 5x A. For instance, this proves 
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that it’s true for 4vgP(v2): from the first part you know that 
it’s true for the atomic formula P(v2). Then you get that it’s true 
for =P(\2) by the second part, and then again that it’s true for 
Avo 3P(v2) itself. Since all formulas are inductively generated 
from atomic formulas, this works for any of them. 


5-4 Satisfaction 


We can already skip ahead to the semantics of first-order logic 
once we know what formulas are: here, the basic definition is that 
of a structure. For our simple language, a structure M has just 
three components: a non-empty set |M| called the domain, what 
a picks out in M, and what P is true of in M. The object picked 
out by ais denoted a™ and the set of things P is true of by P™. 
A structure M consists of just these three things: |M|, ae |M| 
and P™ c |M|. The general case will be more complicated, since 
there will be many predicate symbols and constant symbols, the 
constant symbols can have more than one place, and there will 
also be function symbols. 

This is enough to give a definition of satisfaction for formulas 
that don’t contain variables. The idea is to give an inductive 
definition that mirrors the way we have defined formulas. We 
specify when an atomic formula is satisfied in M, and then when, 
e.g., +A is satisfied in M on the basis of whether or not A is 
satisfied in M. E.g., we could define: 


1. P(a) is satisfied in M iff a@@ « PM. 
2. 7A is satisfied in M iff A is not satisfied in M. 


3. (AA B) is satisfied in M iff A is satisfied in M, and B is 
satisfied in M as well. 


Let’s say that |M| = {0,1,2}, aM = 1, and PM = {1,2}. This 
definition would tell us that P(a) is satisfied in M (since a@ = 
1 € {1,2} = P™). It tells us further that =P(a) is not satisfied 


CHAPTER 5. INTRODUCTION TO FIRST-ORDER LOGIC 76 


in M, and that in turn =—4P(a) is and (4P(a) A P(a)) is not 
satisfied, and so on. 

The trouble comes when we want to give a definition for the 
quantifiers: wed like to say something like, “Avo P(vo) is satisfied 
iff P(vo) is satisfied.” But the structure M doesn’t tell us what to 
do about variables. What we actually want to say is that P(vo) 
is satisfied for some value of vo. To make this precise we need a 
way to assign elements of |M| not just to a but also to vw. To this 
end, we introduce variable assignments. A variable assignment is 
simply a function s that maps variables to elements of |M| (in our 
example, to one of 1, 2, or 3). Since we don’t know beforehand 
which variables might appear in a formula we can’t limit which 
variables s assigns values to. The simple solution is to require 
that s assigns values to all variables vo, 1, ... We'll just use only 
the ones we need. 

Instead of defining satisfaction of formulas just relative to 
a structure, we'll define it relative to a structure M and a vari- 
able assignment s, and write M,s + A for short. Our definition 
will now include an additional clause to deal with atomic formu- 
las containing variables: 


1. M,s & P(a) iff aM ¢ PM, 

2. M,s & P(y;) iff s(v;) € PM. 

3. M,s & AA iff not M,s & A. 

4. M,s & (AA B) iff M,s & Aand M,s & B. 


Ok, this solves one problem: we can now say when M satis- 
fies P(vo) for the value s(v). To get the definition right for 
Avo P(vo) we have to do one more thing: We want to have that 
M,s & Avo P(vo) iff M,s’ & P(vo) for some way s’ of assigning 
a value to vy. But the value assigned to vo does not necessarily 
have to be the value that s(vo) picks out. We’ll introduce a nota- 
tion for that: if m € |M|, then we let s[m/vo] be the assignment 
that is just like s (for all variables other than vo), except to vo it 
assigns m. Now our definition can be: 
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5. M,s & Av; A iff M,s[m/v;] § A for some m € |M|. 


Does it work out? Let’s say we let s(v;) = 0 for allie N. Myst 
Av P(vo) iff there is an m € |M| so that M,s[m/v] & P(w). 
And there is: we can choose m = 1 or m = 2. Note that this 
is true even if the value s(vo) assigned to vo by s itself—in this 
case, 0—doesn’t do the job. We have M,s[1/vo] — P(vo) but not 
M,s §& P(vo). 

If this looks confusing and cumbersome: it is. But the added 
complexity is required to give a precise, inductive definition of 
satisfaction for all formulas, and we need something like it to 
precisely define the semantic notions. There are other ways of 
doing it, but they are all equally (in)elegant. 


5-5 Sentences 


Ok, now we have a (sketch of a) definition of satisfaction (“true 
in”) for structures and formulas. But it needs this additional bit— 
a variable assignment—and what we wanted is a definition of 
sentences. How do we get rid of assignments, and what are sen- 
tences? 

You probably remember a discussion in your first introduction 
to formal logic about the relation between variables and quanti- 
fiers. A quantifier is always followed by a variable, and then in the 
part of the sentence to which that quantifier applies (its “scope”), 
we understand that the variable is “bound” by that quantifier. In 
formulas it was not required that every variable has a matching 
quantifier, and variables without matching quantifiers are “free” 
or “unbound.” We will take sentences to be all those formulas 
that have no free variables. 

Again, the intuitive idea of when an occurrence of a variable 
in a formula A is bound, which quantifier binds it, and when it 
is free, is not difficult to get. You may have learned a method for 
testing this, perhaps involving counting parentheses. We have to 
insist on a precise definition—and because we have defined for- 
mulas by induction, we can give a definition of the free and bound 
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occurrences of a variable x in a formula A also by induction. E.g., 
it might look like this for our simplified language: 


1. If Ais atomic, all occurrences of x in it are free (that is, the 
occurrence of x in P(x) is free). 


2. If A is of the form —B, then an occurrence of x in =B is 
free iff the corresponding occurrence of x is free in B (that 
is, the free occurrences of variables in B are exactly the 
corresponding occurrences in —B). 


3. If Ais of the form (BAC), then an occurrence of x in (BAC) 
is free iff the corresponding occurrence of x is free in B or 


in C. 


4. If Ais of the form 4x B, then no occurrence of x in A is free; 
if it is of the form Sy B where y is a different variable than x, 
then an occurrence of x in Ay B is free iff the corresponding 
occurrence of x is free in B. 


Once we have a precise definition of free and bound occur- 
rences of variables, we can simply say: a sentence is any formula 
without free occurrences of variables. 


5-6 Semantic Notions 


We mentioned above that when we consider whether M,s — A 
holds, we (for convenience) let s assign values to all variables, 
but only the values it assigns to variables in A are used. In fact, 
it’s only the values of free variables in A that matter. Of course, 
because we're careful, we are going to prove this fact. Since sen- 
tences have no free variables, s doesn’t matter at all when it comes 
to whether or not they are satisfied in a structure. So, when A 
is a sentence we can define M § A to mean “M,s § A for all s,” 
which as it happens is true iff M,s + A for at least one s. We 
need to introduce variable assignments to get a working defini- 
tion of satisfaction for formulas, but for sentences, satisfaction is 
independent of the variable assignments. 
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Once we have a definition of “M + A,” we know what “case” 
and “true in” mean as far as sentences of first-order logic are con- 
cerned. On the basis of the definition of M + A for sentences we 
can then define the basic semantic notions of validity, entailment, 
and satisfiability. A sentence is valid, + A, if every structure satis- 
fies it. It is entailed by a set of sentences, I & A, if every structure 
that satisfies all the sentences in I also satisfies A. And a set of 
sentences is satisfiable if some structure satisfies all sentences in 
it at the same time. 

Because formulas are inductively defined, and satisfaction is 
in turn defined by induction on the structure of formulas, we can 
use induction to prove properties of our semantics and to relate 
the semantic notions defined. We'll collect and prove some of 
these properties, partly because they are individually interesting, 
but mainly because many of them will come in handy when we go 
on to investigate the relation between semantics and derivation 
systems. In order to do so, we’ll also have to define (precisely, i.e., 
by induction) some syntactic notions and operations we haven’t 
mentioned yet. 


5-7 Substitution 


We’ll discuss an example to illustrate how things hang together, 
and how the development of syntax and semantics lays the foun- 
dation for our more advanced investigations later. Our derivation 
systems should let us derive P(a) from Vv P(vo). Maybe we even 
want to state this as a rule of inference. However, to do so, we 
must be able to state it in the most general terms: not just for P, 
a, and vo, but for any formula A, and term ¢, and variable x. (Re- 
call that constant symbols are terms, but we’ll consider also more 
complicated terms built from constant symbols and function sym- 
bols.) So we want to be able to say something like, “whenever 
you have derived Vx A(x) you are justified in inferring A(¢)—the 
result of removing Vx and replacing x by ¢.” But what exactly 
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does “replacing x by ¢” mean? What is the relation between A(x) 
and A(t)? Does this always work? 

To make this precise, we define the operation of substitution. 
Substitution is actually tricky, because we can’t just replace all x’s 
in A by ¢, and not every ¢ can be substituted for any x. We'll 
deal with this, again, using inductive definitions. But once this is 
done, specifying an inference rule as “infer A(t) from Vx A(x)” 
becomes a precise definition. Moreover, we’ll be able to show that 
this is a good inference rule in the sense that Vx A(x) entails A(t). 
But to prove this, we have to again prove something that may at 
first glance prompt you to ask “why are we doing this?” That 
Vx A(x) entails A(¢) relies on the fact that whether or not M & 
A(t) holds depends only on the value of the term ¢, i.e., if we let 
m be whatever element of |M| is picked out by ¢, then M,s § A(t) 
iff M,s[m/x] = A(x). This holds even when ¢ contains variables, 
but we’ll have to be careful with how exactly we state the result. 


5-8 Models and Theories 


Once we’ve defined the syntax and semantics of first-order logic, 
we can get to work investigating the properties of structures and 
the semantic notions. We can also define derivation systems, 
and investigate those. For a set of sentences, we can ask: what 
structures make all the sentences in that set true? Given a set of 
sentences 7’, a structure M that satisfies them is called a model 
of [. We might start from I and try to find its models—what do 
they look like? How big or small do they have to be? But we might 
also start with a single structure or collection of structures and 
ask: what sentences are true in them? Are there sentences that 
characterize these structures in the sense that they, and only they, 
are true in them? These kinds of questions are the domain of 
model theory. They also underlie the axiomatic method: describing 
a collection of structures by a set of sentences, the axioms of 
a theory. This is made possible by the observation that exactly 
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those sentences entailed in first-order logic by the axioms are true 
in all models of the axioms. 

As a very simple example, consider preorders. A preorder is 
a relation R on some set A which is both reflexive and transitive. 
A set A with a two-place relation R C A x A on it is exactly what 
we would need to give a structure for a first-order language with 
a single two-place relation symbol P: we would set |M| = A and 
PM — R. Since R isa preorder, it is reflexive and transitive, and 
we can find a set I’ of sentences of first-order logic that say this: 


Vv P(vo, Vo) 
Vvo WV Vve ((P(vo, 1) A P(M41, v2)) > P(vo, va)) 


These sentences are just the symbolizations of “for any x, Rxx” 
(R is reflexive) and “whenever Rxy and Ryz then also Rxz” (R 
is transitive). We see that a structure M is a model of these two 
sentences I iff R (i.e., PM), isa preorder on A (i.e., |M|). In other 
words, the models of I are exactly the preorders. Any property 
of all preorders that can be expressed in the first-order language 
with just P as predicate symbol (like reflexivity and transitivity 
above), is entailed by the two sentences in I’ and vice versa. So 
anything we can prove about models of I we have proved about 
all preorders. 

For any particular theory and class of models (such as J’ and 
all preorders), there will be interesting questions about what can 
be expressed in the corresponding first-order language, and what 
cannot be expressed. There are some properties of structures that 
are interesting for all languages and classes of models, namely 
those concerning the size of the domain. One can always ex- 
press, for instance, that the domain contains exactly n elements, 
for any n € Z*. One can also express, using a set of infinitely 
many sentences, that the domain is infinite. But one cannot ex- 
press that the domain is finite, or that the domain is uncountable. 
These results about the limitations of first-order languages are 
consequences of the compactness and Léwenheim-Skolem theo- 
rems. 
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5-9 Soundness and Completeness 


We’ll also introduce derivation systems for first-order logic. There 
are many derivation systems that logicians have developed, but 
they all define the same derivability relation between sentences. 
We say that I" derives A, I + A, if there is a derivation of a certain 
precisely defined sort. Derivations are always finite arrangements 
of symbols—perhaps a list of sentences, or some more compli- 
cated structure. The purpose of derivation systems is to provide 
a tool to determine if a sentence is entailed by some set 7. In 
order to serve that purpose, it must be true that [ + A if, and 
only if, [+ A. 

If © + A but not Tf A, our derivation system would be 
too strong, prove too much. The property that if [ + A then 
I + Ais called soundness, and it is a minimal requirement on 
any good derivation system. On the other hand, if [ & A but 
not J + A, then our derivation system is too weak, it doesn’t 
prove enough. The property that if [ + Athen J+ A is called 
completeness. Soundness is usually relatively easy to prove (by 
induction on the structure of derivations, which are inductively 
defined). Completeness is harder to prove. 

Soundness and completeness have a number of important 
consequences. If a set of sentences [ derives a contradiction 
(such as A A —A) it is called inconsistent. Inconsistent ['s cannot 
have any models, they are unsatisfiable. From completeness the 
converse follows: any I’ that is not inconsistent—or, as we will 
say, consistent—has a model. In fact, this is equivalent to com- 
pleteness, and is the form of completeness we will actually prove. 
It is a deep and perhaps surprising result: just because you can- 
not prove AA =A from I guarantees that there is a structure that 
is as I’ describes it. So completeness gives an answer to the ques- 
tion: which sets of sentences have models? Answer: all and only 
consistent sets do. 

The soundness and completeness theorems have two impor- 
tant consequences: the compactness and the Lé6wenheim-Skolem 
theorem. These are important results in the theory of models, 
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and can be used to establish many interesting results. We’ve al- 
ready mentioned two: first-order logic cannot express that the 
domain of a structure is finite or that it is uncountable. 

Historically, all of this—how to define syntax and semantics 
of first-order logic, how to define good derivation systems, how 
to prove that they are sound and complete, getting clear about 
what can and cannot be expressed in first-order languages—took 
a long time to figure out and get right. We now know how to 
do it, but going through all the details can still be confusing and 
tedious. But it’s also important, because the methods developed 
here for the formal language of first-order logic are applied all 
over the place in logic, computer science, and linguistics. So 
working through the details pays off in the long run. 


CHAPTER 6 


Syntax of 
First-Order 


Logic 


6.1 Introduction 


In order to develop the theory and metatheory of first-order 
logic, we must first define the syntax and semantics of its expres- 
sions. The expressions of first-order logic are terms and formulas. 
Terms are formed from variables, constant symbols, and function 
symbols. Formulas, in turn, are formed from predicate symbols 
together with terms (these form the smallest, “atomic” formu- 
las), and then from atomic formulas we can form more complex 
ones using logical connectives and quantifiers. There are many 
different ways to set down the formation rules; we give just one 
possible one. Other systems will chose different symbols, will se- 
lect different sets of connectives as primitive, will use parentheses 
differently (or even not at all, as in the case of so-called Polish 
notation). What all approaches have in common, though, is that 
the formation rules define the set of terms and formulas induc- 
tively. If done properly, every expression can result essentially 
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in only one way according to the formation rules. The induc- 
tive definition resulting in expressions that are uniquely readable 
means we can give meanings to these expressions using the same 
method—inductive definition. 


6.2 First-Order Languages 


Expressions of first-order logic are built up from a basic vocab- 
ulary containing variables, constant symbols, predicate symbols and 
sometimes function symbols. From them, together with logical con- 
nectives, quantifiers, and punctuation symbols such as parenthe- 
ses and commas, terms and formulas are formed. 

Informally, predicate symbols are names for properties and 
relations, constant symbols are names for individual objects, and 
function symbols are names for mappings. These, except for 
the identity predicate =, are the non-logical symbols and together 
make up a language. Any first-order language & is determined 
by its non-logical symbols. In the most general case, £ contains 
infinitely many symbols of each kind. 

In the general case, we make use of the following symbols in 
first-order logic: 


1. Logical symbols 


a) Logical connectives: = (negation), A (conjunction), 
V (disjunction), — (conditional), V (universal quanti- 
fier), 4 (existential quantifier). 


b) The propositional constant for falsity L. 
c) The two-place identity predicate =. 
d) A countably infinite set of variables: vo, 1, vo, ... 


2. Non-logical symbols, making up the standard language of 
first-order logic 


a) A countably infinite set of n-place predicate symbols 
for each n > 0: Ab» ane Ad» hea 
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b) A countably infinite set of constant symbols: @, q, 
CO5 esas 


c) A countably infinite set of n-place function symbols 
foreach m-> 0217). G15 lo sexs 


3. Punctuation marks: (, ), and the comma. 


Most of our definitions and results will be formulated for the 
full standard language of first-order logic. However, depending 
on the application, we may also restrict the language to only a 
few predicate symbols, constant symbols, and function symbols. 


Example 6.1. The language £4 of arithmetic contains a single 
two-place predicate symbol <, a single constant symbol 0, one 
one-place function symbol ”, and two two-place function sym- 
bols + and x. 


Example 6.2. The language of set theory £z contains only the 
single two-place predicate symbol €. 


Example 6.3. The language of orders £< contains only the two- 
place predicate symbol <. 


Again, these are conventions: officially, these are just aliases, 
e.g., <, €, and < are aliases for Ae, o for c, ” for ie + for i x 
for ae 

In addition to the primitive connectives and quantifiers in- 
troduced above, we also use the following defined symbols: © 
(biconditional), truth T 

A defined symbol is not officially part of the language, but 
is introduced as an informal abbreviation: it allows us to abbre- 
viate formulas which would, if we only used primitive symbols, 
get quite long. This is obviously an advantage. The bigger ad- 
vantage, however, is that proofs become shorter. If a symbol is 
primitive, it has to be treated separately in proofs. The more 
primitive symbols, therefore, the longer our proofs. 
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You may be familiar with different terminology and symbols 
than the ones we use above. Logic texts (and teachers) com- 
monly use ~, 7, or ! for “negation”, A, -, or & for “conjunction”. 
Commonly used symbols for the “conditional” or “implication” 
are —, =>, and D. Symbols for “biconditional,” “bi-implication,” 
or “(material) equivalence” are <>, ©, and =. The 1 symbol 
is variously called “falsity,” “falsum,”, “absurdity,” or “bottom.” 
The T symbol is variously called “truth,” “verum,” or “top.” 

It is conventional to use lower case letters (e.g., a, b, c) from 
the beginning of the Latin alphabet for constant symbols (some- 
times called names), and lower case letters from the end (e.g., x, 
y, 2) for variables. Quantifiers combine with variables, e.g., x; 
notational variations include Vx, (Vx), (x), I7x, /\, for the uni- 
versal quantifier and dx, (Ax), (Ex), Xx, \/, for the existential 
quantifier. 

We might treat all the propositional operators and both quan- 
tifiers as primitive symbols of the language. We might instead 
choose a smaller stock of primitive symbols and treat the other 
logical operators as defined. “Truth functionally complete” sets 
of Boolean operators include {=,V}, {=,A}, and {=,—>}—these 
can be combined with either quantifier for an expressively com- 
plete first-order language. 

You may be familiar with two other logical operators: the 
Sheffer stroke | (named after Henry Sheffer), and Peirce’s ar- 
row |, also known as Quine’s dagger. When given their usual 
readings of “nand” and “nor” (respectively), these operators are 
truth functionally complete by themselves. 


6.3 Terms and Formulas 


Once a first-order language & is given, we can define expressions 
built up from the basic vocabulary of &. These include in partic- 
ular terms and formulas. 
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Definition 6.4 (Terms). The set of terms Trm(&) of & is de- 
fined inductively by: 


. Every variable is a term. 
. Every constant symbol of & is a term. 


. If f is an n-place function symbol and 4, ..., ¢, are terms, 
then f(t,...,¢,) is a term. 


. Nothing else is a term. 


A term containing no variables is a closed term. 


The constant symbols appear in our specification of the lan- 
guage and the terms as a separate category of symbols, but they 
could instead have been included as zero-place function symbols. 
We could then do without the second clause in the definition of 
terms. We just have to understand f(t,...,é,) as just f by itself 
if n = 0. 


Definition 6.5 (Formulas). The set of formulas Frm(&L) of the 
language & is defined inductively as follows: 


1. Lis an atomic formula. 


2. If R is an n-place predicate symbol of ¥ and 4, ..., tn are 
terms of &, then R(4,...,¢,) is an atomic formula. 


. If & and f are terms of &, then =(4, 2) is an atomic for- 
mula. 


. If Ais a formula, then —4 is formula. 


. If A and B are formulas, then (A A B) is a formula. 


. If A and B are formulas, then (A V B) is a formula. 


. If A and B are formulas, then (A — B) is a formula. 
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8. If A is a formula and x is a variable, then Vx A is a formula. 


g. If Ais a formula and x is a variable, then 4x A is a formula. 


10. Nothing else is a formula. 


The definitions of the set of terms and that of formulas are 
inductive definitions. Essentially, we construct the set of formu- 
las in infinitely many stages. In the initial stage, we pronounce 
all atomic formulas to be formulas; this corresponds to the first 
few cases of the definition, i.e., the cases for L, R(4,...,t,) and 
=(t, 2). “Atomic formula” thus means any formula of this form. 

The other cases of the definition give rules for constructing 
new formulas out of formulas already constructed. At the second 
stage, we can use them to construct formulas out of atomic for- 
mulas. At the third stage, we construct new formulas from the 
atomic formulas and those obtained in the second stage, and so 
on. A formula is anything that is eventually constructed at such 
a stage, and nothing else. 

By convention, we write = between its arguments and leave 
out the parentheses: ¢; = fg is an abbreviation for =(t;, é2). More- 
over, —=(¢1, 2) is abbreviated as 4; # fg. When writing a formula 
(BC) constructed from B, C using a two-place connective *, we 
will often leave out the outermost pair of parentheses and write 
simply B* C. 

Some logic texts require that the variable x must occur in A 
in order for 4x A and Vx A to count as formulas. Nothing bad 
happens if you don’t require this, and it makes things easier. 


Definition 6.6. Formulas constructed using the defined opera- 
tors are to be understood as follows: 


1. T abbreviates —_L. 


2. Ao B abbreviates (A > B) A (B- A). 


If we work in a language for a specific application, we will 
often write two-place predicate symbols and function symbols 
between the respective terms, e.g., 4 < f) and (¢ + é) in the 
language of arithmetic and 4 € 2 in the language of set the- 
ory. The successor function in the language of arithmetic is even 
written conventionally after its argument: ¢’. Officially, however, 
these are just conventional abbreviations for As(t, to), i ee to), 
As (t1, t2) and ras respectively. 


Definition 6.7 (Syntactic identity). The symbol = expresses 
syntactic identity between strings of symbols, ie., A = B iff A 
and B are strings of symbols of the same length and which con- 
tain the same symbol in each place. 


The = symbol may be flanked by strings obtained by con- 
catenation, e.g., A = (B V C) means: the string of symbols A is 
the same string as the one obtained by concatenating an opening 
parenthesis, the string B, the V symbol, the string C’, and a clos- 
ing parenthesis, in this order. If this is the case, then we know 
that the first symbol of A is an opening parenthesis, A contains 
B as a substring (starting at the second symbol), that substring 
is followed by V, etc. 

As terms and formulas are built up from basic elements via in- 
ductive definitions, we can use the following induction principles 
to prove things about them. 


Lemma 6.8 (Principle of induction on terms). Let L be a first- 
order language. If some property P holds in all of the following cases, 
then P(t) for every t € Trm(&). 


1. P(v) for every variable v, 
2. P(a) for every constant symbol a of LZ, 


3. fts,...,t, € Trm(Z), f is an n-place function symbol of L, 
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Lemma 6.9 (Principle of induction on formulas). Let L be a 
first-order language. If some property P holds for all the atomic for- 
mulas and is such that 


1. A is an atomic formula. 

2. it holds for =A whenever it holds for A; 

3. it holds for (A \ B) whenever it holds for A and B; 
4. it holds for (AV B) whenever it holds for A and B; 
5. it holds for (A — B) whenever it holds for A and B; 
6. it holds for AxA whenever it holds for A; 

7. it holds forxA whenever it holds for A; 


then P holds for all formulas A € Frm(&). 


6.4 Unique Readability 


The way we defined formulas guarantees that every formula has 
a unique reading, i.e., there is essentially only one way of con- 
structing it according to our formation rules for formulas and 
only one way of “interpreting” it. If this were not so, we would 
have ambiguous formulas, i.e., formulas that have more than one 
reading or intepretation—and that is clearly something we want 
to avoid. But more importantly, without this property, most of the 
definitions and proofs we are going to give will not go through. 

Perhaps the best way to make this clear is to see what would 
happen if we had given bad rules for forming formulas that would 
not guarantee unique readability. For instance, we could have 
forgotten the parentheses in the formation rules for connectives, 
e.g., we might have allowed this: 


If A and B are formulas, then so is A > B. 
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Starting from an atomic formula D, this would allow us to form 
D— D. From this, together with D, we would get D — D — D. 
But there are two ways to do this: 


1. We take D to be A and D— D to be B. 
2. We take A to be D = D and B is D. 


Correspondingly, there are two ways to “read” the formula D > 
D—- D. It is of the form B— C where B is D and C is D— D, but 
it is also of the form B — C with B being D — D and C being D. 

If this happens, our definitions will not always work. For in- 
stance, when we define the main operator of a formula, we say: in 
a formula of the form B — C, the main operator is the indicated 
occurrence of —. But if we can match the formula D — D > D 
with B — C in the two different ways mentioned above, then in 
one case we get the first occurrence of — as the main operator, 
and in the second case the second occurrence. But we intend the 
main operator to be a function of the formula, i.e., every formula 
must have exactly one main operator occurrence. 


Proof: We prove this by induction on the way A is constructed. 
This requires two things: (a) We have to prove first that all atomic 
formulas have the property in question (the induction basis). (b) 
Then we have to prove that when we construct new formulas out 
of given formulas, the new formulas have the property provided 
the old ones do. 

Let /(A) be the number of left parentheses, and r(.A) the num- 
ber of right parentheses in A, and /(¢) and r(¢) similarly the num- 
ber of left and right parentheses in a term ¢. 


1. A=1: Ahas o left and o right parentheses. 


2. A= R(h,...,tp): 1A) =14+1(h) +---+l(,) =14+7(4) + 
--++7(t,) = r(A). Here we make use of the fact, left as an 
exercise, that /(¢) = r(t) for any term ¢. 


Az=t=t: (A) =1(4) +l(t) = 7(h) + 7r(to) = r(A). 
A= -B: By induction hypothesis, /(B) = r(B). Thus 
L(A) = 1(B) = r(B) = r(A). 


5. A = (B*C): By induction hypothesis, /(B) = r(B) and 
l(C) = r(C). Thus /(A) = 14+/(B)4+1(C) = 147r(B)4+7(C) = 
r(A). 


6. A = Vx B: By induction hypothesis, /(B) = r(B). Thus, 
L(A) = 1(B) = r(B) = 7r(A). 


7. A=Ax B: Similarly. Oo 


Definition 6.11 (Proper prefix). A string of symbols B is a 
proper prefix of a string of symbols A if concatenating B and a 
non-empty string of symbols yields A. 


Lemma 6.12. /f A is a formula, and B is a proper prefix of A, then 
B is not a formula. 


Proof. Exercise. Oo 


Proposition 6.13. Jf A is an atomic formula, then it satisfies one, 
and only one of the following conditions. 


1, A= LL. 


2. A= R(t,...,tn) where R is an n-place predicate symbol, ty, ..., 
tn are terms, and each of R, ti, ..., tn is uniquely determined. 


3. A=t = to where ti and to are uniquely determined terms. 


Proof. Exercise. Oo 


Proposition 6.14 (Unique Readability). Every formula satisfies 
one, and only one of the following conditions. 


1. A is atomic. 


ho 


A is of the form —B. 

3. A is of the form (BAC). 
4. A is of the form (BV C). 
5. A is of the form (B— C). 
6. A is of the form Vx B. 

7. A is of the form Ax B. 


Moreover, in each case B, or B and C, are uniquely determined. This 
means that, e.g., there are no different pairs B, C and B’, C’ so that A 
is both of the form (B — C) and (B’ > C’). 


Proof. The formation rules require that if a formula is not atomic, 
it must start with an opening parenthesis (, =, or a quantifier. On 
the other hand, every formula that starts with one of the following 
symbols must be atomic: a predicate symbol, a function symbol, 
a constant symbol, . 

So we really only have to show that if A is of the form (B« C) 
and also of the form (B’ «’ C’), then B = B’, C = C’, and * = *’. 

So suppose both A = (B* C) and A = (B’ »’ C’). Then either 
B = B’ or not. If it is, clearly * = x’ and C = C’, since they then 
are substrings of A that begin in the same place and are of the 
same length. The other case is B # B’. Since B and B’ are both 
substrings of A that begin at the same place, one must be a proper 
prefix of the other. But this is impossible by Lemma 6.12. Oo 
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6.5 Main operator of a Formula 


It is often useful to talk about the last operator used in construct- 
ing a formula A. This operator is called the main operator of A. 
Intuitively, it is the “outermost” operator of A. For example, the 
main operator of —A is —, the main operator of (AV B) is V, etc. 


Definition 6.15 (Main operator). The main operator of a for- 
mula A is defined as follows: 


1. Ais atomic: A has no main operator. 
2. A=-B: the main operator of A is -. 


3. A=(BAC): the main operator of A is A. 


. A=(BVC): the main operator of A is V. 


. A=(B—C): the main operator of A is >. 
. A=Vx B: the main operator of A is V. 


. A=4x B: the main operator of A is 3. 


In each case, we intend the specific indicated occurrence of the 
main operator in the formula. For instance, since the formula 
((D> E)—>(E-D)) is of the form (BC) where B is (D—£) and 
C is (E — D), the second occurrence of — is the main operator. 

This is a recursive definition of a function which maps all non- 
atomic formulas to their main operator occurrence. Because of 
the way formulas are defined inductively, every formula A satis- 
fies one of the cases in Definition 6.15. This guarantees that for 
each non-atomic formula A a main operator exists. Because each 
formula satisfies only one of these conditions, and because the 
smaller formulas from which A is constructed are uniquely deter- 
mined in each case, the main operator occurrence of A is unique, 
and so we have defined a function. 
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We call formulas by the names in Table 6.1 depending on 
which symbol their main operator is.Recall, however, that de- 
fined operators do not officially appear in formulas. They are 
just abbreviations, so officially they cannot be the main operator 
of a formula. In proofs about all formulas they therefore do not 
have to be treated separately. 


Main operator Type of formula Example 
none atomic (formula) 1, R(4,...,tn), 4 = 
a negation AA 
A conjunction (AA B) 
Vv disjunction (AV B) 
> conditional (A — B) 
o biconditional (A B) 
Vv universal (formula) Vx A 
4 existential (formula) Ax A 


Table 6.7: Main operator and names of formulas 


6.6 Subformulas 


It is often useful to talk about the formulas that “make up” a 
given formula. We call these its subformulas. Any formula counts 
as a subformula of itself; a subformula of A other than A itself is 
a proper subformula. 


Definition 6.16 (Immediate Subformula). If A is a formula, 
the immediate subformulas of A are defined inductively as follows: 


1. Atomic formulas have no immediate subformulas. 


2. A=-B: The only immediate subformula of A is B. 


3. A= (B«C): The immediate subformulas of A are B and 
C (* is any one of the two-place connectives). 


4. A=VxB: The only immediate subformula of A is B. 
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Definition 6.17 (Proper Subformula). If A is a formula, the 
proper subformulas of A are defined recursively as follows: 


. Atomic formulas have no proper subformulas. 


. A=-—B: The proper subformulas of A are B together with 
all proper subformulas of B. 


. A = (B«C): The proper subformulas of A are B, C, 


together with all proper subformulas of B and those of C. 


. A =VxB: The proper subformulas of A are B together 
with all proper subformulas of B. 


. A = Ax B: The proper subformulas of A are B together 
with all proper subformulas of B. 


Definition 6.18 (Subformula). The subformulas of A are A it- 
self together with all its proper subformulas. 


Note the subtle difference in how we have defined immediate 
subformulas and proper subformulas. In the first case, we have 
directly defined the immediate subformulas of a formula A for 
each possible form of A. It is an explicit definition by cases, and 
the cases mirror the inductive definition of the set of formulas. 
In the second case, we have also mirrored the way the set of all 
formulas is defined, but in each case we have also included the 
proper subformulas of the smaller formulas B, C’ in addition to 
these formulas themselves. This makes the definition recursive. In 
general, a definition of a function on an inductively defined set 
(in our case, formulas) is recursive if the cases in the definition of 
the function make use of the function itself. To be well defined, 
we must make sure, however, that we only ever use the values 


CHAPTER 6. SYNTAX OF FIRST-ORDER LOGIC 98 


of the function for arguments that come “before” the one we are 
defining—in our case, when defining “proper subformula” for (B« 
C) we only use the proper subformulas of the “earlier” formulas 
Band C. 


6.7. Formation Sequences 


Defining formulas via an inductive definition, and the comple- 
mentary technique of proving properties of formulas via induc- 
tion, is an elegant and efficient approach. However, it can also 
be useful to consider a more bottom-up, step-by-step approach to 
the construction of formulas, which we do here using the notion 
of a formation sequence. To show how terms and formulas can be 
introduced in this way without needing to refer to their inductive 
definitions, we first introduce the notion of an arbitrary string of 
symbols drawn from some language &. 


Definition 6.21 (Strings). Suppose & is a first-order language. 
An &£-string is a finite sequence of symbols of %. Where the 


language & is clearly fixed by the context, we will often refer to 
a L£-string as a string simpliciter. 


Example 6.22. For any first-order language &, all &-formulas 
are &-strings, but not conversely. For example, 


\(w- 4 


is an L-string but not an &-formula. 
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Definition 6.23 (Formation sequences for terms). A _ finite 
sequence of Y-strings (fp,...,¢,) is a formation sequence for a term 
t if ¢ = ¢, and for all i < n, either ¢; is a variable or a constant 


symbol, or & contains a k-ary function symbol f and there exist 
mo,...,mz <i such that & = f(émy,..-5tm,)- 


Example 6.24. The sequence 
(co, Vo» fy (Cos Vo), fy (fy (Co, Yo))) 


is a formation sequence for the term ite (COs Vo)), as is 


(Vos Cos fo (Co, Vo). fy (f(Co, Vo)))- 


Definition 6.25 (Formation sequences for formulas). A _ fi- 
nite sequence of L-strings (Ao,...,An) is a formation sequence 
for A if A = A, and for all i < n, either A; is an atomic formula 


or there exist j,k < i and a variable x such that one of the 
following holds: 


1. A; = A;. 


, = (A; A Ay). 


, = (A; V Ay). 


p= (Ay Aq). 


Example 6.26. 
(Ap (vo), Az (C1), (At (cr) A Aj(Vo)). Avo (Ay (er) A AQ (vo))) 
is a formation sequence of Av (At(a1) A Aj(v))s as is 


(Ag (vo), Aq(c1), (Az (cr) A Aj(vo)), Aq (C1), 
Vvy Aj(vo)s V0 (Az (c1) A AG(vo)))- 
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As can be seen from the second example, formation sequences 
may contain “junk”: formulas which are redundant or do not 
contribute to the construction. 


Proof. Suppose A is atomic. Then the sequence (A) is a forma- 
tion sequence for A. Now suppose that B and C' have formation 
sequences (Bo,...,B,) and (Co,...,Cm) respectively. 


1. If A = 7B, then (Bo,...,Bn,7B,) is a formation sequence 
for A. 


2. If A= (BAC), then (Bo,...,Bn,Co,.-.,Cm,(Bn A Cm)) is a 
formation sequence for A. 


3. If A= (BV C), then (Bo,...,Bn,Co,...,Cm,(Bn V Cm)) is a 
formation sequence for A. 


4. If A= (B—C), then (Bo,...,Bn,Co,...,Cm,(Bn — Cm)) is 
a formation sequence for A. 


5. If A = Vx B, then (Bo,...,Bn,Vx Bn) is a formation se- 
quence for A. 


6. If A = Ax B, then (Bo,...,By,4x B,) is a formation se- 
quence for A. 


By the principle of induction on formulas, every formula has a 
formation sequence. Oo 


We can also prove the converse. This is important because 
it shows that our two ways of defining formulas are equivalent: 
they give the same results. It also means that we can prove the- 
orems about formulas by using ordinary induction on the length 
of formation sequences. 
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Proof. Exercise. Oo 


Proof. Let F be the set of all strings of symbols in the language L 
that have a formation sequence. We have seen in Proposition 6.27 
that Frm(L) C F, so now we prove the converse. 

Suppose A has a formation sequence (Ao,...,An). We prove 
that A € Frm({&) by strong induction on n. Our induction hypoth- 
esis is that every string of symbols with a formation sequence of 
length m < n is in Frm(&). By the definition of a formation se- 
quence, either A, is atomic or there must exist j,k < nm such that 
one of the following is the case: 


1. A; = 7A ;. 

2. A; = (A; A Ax). 

3. A; = (A; V Ax). 

4. Aj = (A; — Aj). 

5. A; = Vx Aj. 

6. A; = 5x Aj. 
Now we reason by cases. If A, is atomic then A, € Frm(YZ). Sup- 
pose instead that A = (A;AA;). By Lemma 6.28, (Ao,...,4;) and 
(Ao,..-,Ax) are formation sequences for A; and A;, respectively. 
Since these are proper initial subsequences of the formation se- 
quence for A, they both have length less than n. Therefore by 
the induction hypothesis, A; and A; are in Frm(Yo), and by the 


definition of a formula, so is (A; \ Ax). The other cases follow 
by parallel reasoning. oO 
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Formation sequences for terms have similar properties to 
those for formulas. 


Proof. Exercise. Oo 


There are two types of “junk” that can appear in formation 
sequences: repeated elements, and elements that are irrelevant 
to the construction of the formation or term. We can eliminate 
both by looking at minimal formation sequences. 


Definition 6.31 (Minimal formation sequences). A forma- 
tion sequence (Ao,...,4,) for A is a minimal formation sequence 
for A if for every other formation sequence s for A, the length 
of s is greater than or equal to n+ 1. 


Proof. Exercise. Oo 


6.8 Free Variables and Sentences 


Definition 6.33 (Free occurrences of a variable). The _/ree 
occurrences of a variable in a formula are defined inductively as 
follows: 


1. Ais atomic: all variable occurrences in A are free. 
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2. A = 7B: the free variable occurrences of A are exactly 
those of B. 


3. A = (BC): the free variable occurrences of A are those 
in B together with those in C. 


Hl 


4. A = Vx B: the free variable occurrences in A are all of 
those in B except for occurrences of x. 


5. A = dx B: the free variable occurrences in A are all of 
those in B except for occurrences of x. 


Definition 6.34 (Bound Variables). An occurrence of a vari- 
able in a formula A is bound if it is not free. 


Definition 6.35 (Scope). If Vx B is an occurrence of a subfor- 
mula in a formula A, then the corresponding occurrence of B 


in A is called the scope of the corresponding occurrence of Vx. 
Similarly for 4x. 

If B is the scope of a quantifier occurrence Vx or Sx in A, then 
the free occurrences of x in B are bound in Vx B and 4x B. We 
say that these occurrences are bound by the mentioned quantifier 
occurrence. 


Example 6.36. Consider the following formula: 


Avo Ag (vo, 1) 
ns ee 
B 


B represents the scope of dv. The quantifier binds the occur- 
rence of vo in B, but does not bind the occurrence of vj. So vy is 
a free variable in this case. 
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We can now see how this might work in a more complicated 
formula A: 


D 
i 

Vvo (Aj(vo) > Ag(vo, 1)) 23M (Az(vo, V1) V vo >At (vo) 
SS SS 


B Cc 


B is the scope of the first Vvo, C is the scope of Avy, and D is the 
scope of the second Vv. The first Vvo binds the occurrences of 
vo in B, Ay, binds the occurrence of vy, in C’, and the second Vv 
binds the occurrence of vy in D. The first occurrence of v; and 
the fourth occurrence of vp are free in A. The last occurrence of 
vo is free in D, but bound in C and A. 


6.9 Substitution 


Definition 6.38 (Substitution in a term). We define s[¢/x], 
the result of substituting t for every occurrence of x in s, recur- 
sively: 


1. s=c: s[t/x] is just s. 


2. 5 = y: s[t/x] is also just s, provided y is a variable and 
yEX. 


3. s=x: s[t/x] is ¢. 


Av 8 SF (ipivssty)? S14] 18 f (aE A) cee egtn 2/4 )): 
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Definition 6.39. A term ¢ is free for x in A if none of the free 


occurrences of x in A occur in the scope of a quantifier that binds 
a variable in ¢. 


Example 6.40. 
1. vg is free for yj in Av3A?( vs, V1) 


2. A(M1, v2) is not free for vo in Vv2A?(vo, vy) 


Definition 6.41 (Substitution in a formula). If Ais a formula, 
x is a variable, and ¢ is a term free for x in A, then A[¢/x] is the 
result of substituting ¢ for all free occurrences of x in A. 


.-A=L: Alt/x] is L. 

A= P(t... 5tn): Alt/x] is P(t [t/x],....t,[t/x]). 
A= =t: Alt/x] is 4[t/x] = w[t/x]. 

_A=-B: Alt/x] is oBle/x]. 

_A=(BAC): Alt/x] is (Blt/x] A C[t/x]). 
_A=(BYVC): Alt/x] is (Blt/x] v C[t/x]). 
_A=(B3C): Alt/x] is (B[t/x] > C[t/x]). 


8. A = Vy B: Alt/x] is Vy B[t/x], provided y is a variable 
other than x; otherwise A[t/x] is just A. 


.A=ayB: Alt/x] is dy Blt/x], provided y is a variable 
other than x; otherwise A[t/x] is just A. 


Note that substitution may be vacuous: If x does not occur in 
A at all, then A[t/x] is just A. 

The restriction that ¢ must be free for x in A is necessary 
to exclude cases like the following. If A = dyx < y and t = y, 
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then A[t/x] would be Syy < y. In this case the free variable y 
is “captured” by the quantifier dy upon substitution, and that is 
undesirable. For instance, we would like it to be the case that 
whenever Vx B holds, so does B[t/x]. But consider Vx Ay x < y 
(here B is Hy x < y). It is a sentence that is true about, e.g., the 
natural numbers: for every number x there is a number y greater 
than it. If we allowed y as a possible substitution for x, we would 
end up with B[y/x] = dy y < y, which is false. We prevent this by 
requiring that none of the free variables in ¢ would end up being 
bound by a quantifier in A. 

We often use the following convention to avoid cumbersome 
notation: If A is a formula which may contain the variable x free, 
we also write A(x) to indicate this. When it is clear which A 
and x we have in mind, and ¢ is a term (assumed to be free for x 
in A(x)), then we write A(t) as short for A[t/x]. So for instance, 
we might say, “we call A(¢) an instance of Vx A(x).” By this we 
mean that if A is any formula, x a variable, and ¢ a term that’s 
free for x in A, then A[t/x] is an instance of Vx A. 


Summary 


A first-order language consists of constant, function, and 
predicate symbols. Function and constant symbols take a speci- 
fied number of arguments. In the language of arithmetic, e.g., 
we have a single constant symbol 0, one 1-place function sym- 
bol 7, two 2-place function symbols + and x, and one 2-place 
predicate symbol <. From variables and constant and function 
symbols we form the terms of a language. From the terms of 
a language together with its predicate symbols, as well as the 
identity symbol =, we form the atomic formulas. And in turn 
from them, using the logical connectives =, V, A, >, < and the 
quantifiers V and 5 we form its formulas. Since we are careful to 
always include necessary parentheses in the process of forming 
terms and formulas, there is always exactly one way of reading a 
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formula. This makes it possible to define things by induction on 
the structure of formulas. 

Occurrences of variables in formulas are sometimes governed 
by a corresponding quantifier: if a variable occurs in the scope 
of a quantifier it is considered bound, otherwise free. These 
concepts all have inductive definitions, and we also inductively 
define the operation of substitution of a term for a variable in 
a formula. Formulas without free variable occurrences are called 
sentences. 

Problems 
Problem 6.1. Prove Lemma 6.8. 
Problem 6.2. Prove that for any term ¢, /(¢) = r(t). 


Problem 6.3. Prove Lemma 6.12. 


Problem 6.4. Prove Proposition 6.13 (Hint: Formulate and 
prove a version of Lemma 6.12 for terms.) 


Problem 6.5. Prove Proposition 6.19. 
Problem 6.6. Prove Proposition 6.20. 
Problem 6.7. Prove Lemma 6.28. 


Problem 6.8. Prove Proposition 6.30. Hint: use a similar strat- 
egy to that used in the proof of Theorem 6.29. 


Problem 6.9. Prove Proposition 6.32. 


Problem 6.10. Give an inductive definition of the bound vari- 
able occurrences along the lines of Definition 6.33. 


Semantics of 
First-Order 
Logic 


7.1 Introduction 


Giving the meaning of expressions is the domain of semantics. 
The central concept in semantics is that of satisfaction in a struc- 
ture. A structure gives meaning to the building blocks of the 
language: a domain is a non-empty set of objects. The quanti- 
fiers are interpreted as ranging over this domain, constant sym- 
bols are assigned elements in the domain, function symbols are 
assigned functions from the domain to itself, and predicate sym- 
bols are assigned relations on the domain. The domain together 
with assignments to the basic vocabulary constitutes a structure. 
Variables may appear in formulas, and in order to give a seman- 
tics, we also have to assign elements of the domain to them—this 
is a variable assignment. The satisfaction relation, finally, brings 
these together. A formula may be satisfied in a structure M rela- 
tive to a variable assignment s, written as M,s + A. This relation 
is also defined by induction on the structure of A, using the truth 
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tables for the logical connectives to define, say, satisfaction of 
(AA B) in terms of satisfaction (or not) of A and B. It then turns 
out that the variable assignment is irrelevant if the formula A 
is a sentence, i.e., has no free variables, and so we can talk of 
sentences being simply satisfied (or not) in structures. 

On the basis of the satisfaction relation M + A for sentences 
we can then define the basic semantic notions of validity, entail- 
ment, and satisfiability. A sentence is valid, § A, if every struc- 
ture satisfies it. It is entailed by a set of sentences, I’ + A, if every 
structure that satisfies all the sentences in I also satisfies A. And 
a set of sentences is satisfiable if some structure satisfies all sen- 
tences in it at the same time. Because formulas are inductively 
defined, and satisfaction is in turn defined by induction on the 
structure of formulas, we can use induction to prove properties 
of our semantics and to relate the semantic notions defined. 


7.2 Structures for First-order Languages 


First-order languages are, by themselves, uninterpreted: the con- 
stant symbols, function symbols, and predicate symbols have no 
specific meaning attached to them. Meanings are given by spec- 
ifying a structure. It specifies the domain, i.e., the objects which 
the constant symbols pick out, the function symbols operate on, 
and the quantifiers range over. In addition, it specifies which 
constant symbols pick out which objects, how a function symbol 
maps objects to objects, and which objects the predicate symbols 
apply to. Structures are the basis for semantic notions in logic, 
e.g., the notion of consequence, validity, satisfiability. They are 
variously called “structures,” “interpretations,” or “models” in 
the literature. 


Definition 7.1 (Structures). A structure M, for a language & of 
first-order logic consists of the following elements: 


1. Domain: a non-empty set, |M| 
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2. Interpretation of constant symbols: for each constant symbol ¢ 
of &, an element c™ € |M| 


. Interpretation of predicate symbols: for each n-place predicate 
symbol R of & (other than =), an n-place relation RM Cc 


|M|" 


. Interpretation of function symbols: for each n-place function 
symbol f of &, an n-place function f@: |M|" > |M| 


Example 7.2. A structure M for the language of arithmetic con- 
sists of a set, an element of |M|, o™, as interpretation of the 
constant symbol 0, a one-place function *“: |M| — |M|, two 
two-place functions +“ and x™, both |M |? — |MI, and a two- 
place relation <“ ¢ |M|?. 

An obvious example of such a structure is the following: 


1. IN| =N 

oo =0 

3. /X(n) = n+1 for allneN 

4. +N (n,m) = +m for all n,m Ee N 
5: xN (n,m) = n-m for all n,m Ee N 
6. <N = {(n,m):n€N,meN,n < m} 


The structure N for Ly so defined is called the standard model of 
arithmetic, because it interprets the non-logical constants of L4 
exactly how you would expect. 

However, there are many other possible structures for £4. For 
instance, we might take as the domain the set Z of integers instead 
of N, and define the interpretations of 0, ”, +, x, < accordingly. 
But we can also define structures for £4 which have nothing even 
remotely to do with numbers. 
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Example 7.3. A structure M for the language Lz of set theory 
requires just a set and a single-two place relation. So technically, 
e.g., the set of people plus the relation “x is older than y” could 
be used as a structure for Lz, as well as N together with n > m 
for n,m eN. 

A particularly interesting structure for £z in which the ele- 
ments of the domain are actually sets, and the interpretation of € 
actually is the relation “x is an element of y” is the structure HF 
of hereditarily finite sets: 


1. |HF| = 0U 9(0) U 9(9(0)) U e(9(—(O))) U...; 


2, EHF = {(x,y): x,y © |HF|,x € y}. 


The stipulations we make as to what counts as a structure 
impact our logic. For example, the choice to prevent empty do- 
mains ensures, given the usual account of satisfaction (or truth) 
for quantified sentences, that 4x (A(x) V =A(x)) is valid—that 
is, a logical truth. And the stipulation that all constant symbols 
must refer to an object in the domain ensures that the existential 
generalization is a sound pattern of inference: A(a), therefore 
Ax A(x). If we allowed names to refer outside the domain, or to 
not refer, then we would be on our way to a free logic, in which ex- 
istential generalization requires an additional premise: A(a) and 
dx x = a, therefore 4x A(x). 


7.3 Covered Structures for First-order 
Languages 


Recall that a term is closed if it contains no variables. 


Definition 7.4 (Value of closed terms). If ¢ is a closed term of 
the language ¥ and M isa structure for &, the value Val™ (t) is 
defined as follows: 


1. If ¢ is just the constant symbol c, then ValM@(c) = e™. 
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2. If t is of the form f(4,...,¢,), then 


ValM (4) = fM(ValM(t)),..., Val” (z,)). 


Definition 7.5 (Covered structure). A structure is covered if ev- 
ery element of the domain is the value of some closed term. 


Example 7.6. Let & be the language with constant symbols 
zero, one, two, ..., the binary predicate symbol <, and the bi- 
nary function symbols + and x. Then a structure M for & is the 
one with domain |M| = {0,1,2,...} and assignments zero™ = 0, 
oneM = 1, twoM = 2, and so forth. For the binary relation 
symbol <, the set <™ is the set of all pairs (cj,c2) € |M|? 
such that ¢c; is less than cg: for example, (1,3) € <M put 
(2,2) ¢ <M. For the binary function symbol +, define +” in 
the usual way—for example, +“(2,3) maps to 5, and similarly 
for the binary function symbol x. Hence, the value of four is 
just 4, and the value of x(two,+(three,zero)) (or in infix nota- 
tion, two x (three + zero)) is 


ValM (x(two,+(three, zero)) = 
= xM (Val! (two), ValM@(+(three, zero))) 
= xM(ValM (two),+™ (Val (three), ValM (zero))) 
= x“ (two™ a" tthree™, zero™) 

x (9:47 (3.0)) 

<M (2,3) 

6 


7.4 Satisfaction of a Formula in a Structure 


The basic notion that relates expressions such as terms and for- 
mulas, on the one hand, and structures on the other, are those 
of value of a term and satisfaction of a formula. Informally, the 


CHAPTER 7. SEMANTICS OF FIRST-ORDER LOGIC 113 


value of a term is an element of a structure—if the term is just a 
constant, its value is the object assigned to the constant by the 
structure, and if it is built up using function symbols, the value is 
computed from the values of constants and the functions assigned 
to the functions in the term. A formula is satisfied in a structure 
if the interpretation given to the predicates makes the formula 
true in the domain of the structure. This notion of satisfaction 
is specified inductively: the specification of the structure directly 
states when atomic formulas are satisfied, and we define when a 
complex formula is satisfied depending on the main connective 
or quantifier and whether or not the immediate subformulas are 
satisfied. 

The case of the quantifiers here is a bit tricky, as the imme- 
diate subformula of a quantified formula has a free variable, and 
structures don’t specify the values of variables. In order to deal 
with this difficulty, we also introduce variable assignments and de- 
fine satisfaction not with respect to a structure alone, but with 
respect to a structure plus a variable assignment. 


Definition 7.7 (Variable Assignment). A variable — assign- 


ment s for a structure M is a function which maps each variable 
to an element of |M|, i.e., 5: Var — |M]. 


A structure assigns a value to each constant symbol, and a 
variable assignment to each variable. But we want to use terms 
built up from them to also name elements of the domain. For 
this we define the value of terms inductively. For constant sym- 
bols and variables the value is just as the structure or the variable 
assignment specifies it; for more complex terms it is computed re- 
cursively using the functions the structure assigns to the function 
symbols. 


Definition 7.8 (Value of Terms). If ¢ is a term of the lan- 
guage &, M is a structure for &, and s is a variable assignment 


for M, the value ValM@ (¢) is defined as follows: 
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ian Val oes". 


2.¢=é: Val™ (t) = s(x). 


PSF Cigie cata 
ValM (4) = f@(ValM(t),..., Val (z,)). 


Definition 7.9 (x-Variant). If s is a variable assignment for 
a structure M, then any variable assignment s’ for M which dif- 
fers from s at most in what it assigns to x is called an x-variant 
of s. If s’ is an x-variant of s we write s’ ~, s. 


Note that an x-variant of an assignment s does not have to 
assign something different to x. In fact, every assignment counts 
as an x-variant of itself. 


Definition 7.10. If s is a variable assignment for a structure M 
and m € |M|, then the assignment s|m/x] is the variable assign- 
ment defined by 


m ify=x 
s(y) otherwise. 


s[m/x](y) 


In other words, s[m/x] is the particular x-variant of s which 
assigns the domain element m to x, and assigns the same things 
to variables other than x that s does. 


Definition 7.11 (Satisfaction). Satisfaction of a formula A in 
a structure M relative to a variable assignment s, in symbols: 
M,s § A, is defined recursively as follows. (We write M,s # A to 


mean “not M,s 5 A.”) 


1. A=1: M,s£A. 
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2, A= R(t,...,t,): M,s A iff (Val (t,)) € 
RM, 


. A=t,=t: M,s & A iff ValM@(t,) = ValM@ (4). 


_A=A7AB: M,st Aiff M,s £ B. 

.-A=(BAC): M,s + Aiff M,s + Band M,s eC. 
.A=(BVC): M,sF& A iff M,s & B or M,s & C (or both). 
.A=(B-C): M,sF A iff M,s ¥ B or M,s © C (or both). 


8. A = Vx B: M,s & A iff for every element m ¢€ |M|, 
M,s[m/x] 5 B. 


.A=5AxB: M,s & A iff for at least one element m € |M|, 
M,s[m/x] 5 B. 


The variable assignments are important in the last two 
clauses. We cannot define satisfaction of Vx B(x) by “for all 
m €|M|, M & B(m).” We cannot define satisfaction of 4x B(x) 
by “for at least one m € |M|, M & B(m).” The reason is that 
if m € |M|, it is not a symbol of the language, and so B(m) is 
not a formula (that is, B[m/x] is undefined). We also cannot 
assume that we have constant symbols or terms available that 
name every element of M, since there is nothing in the definition 
of structures that requires it. In the standard language, the set of 
constant symbols is countably infinite, so if |M| is not countable 
there aren’t even enough constant symbols to name every object. 

We solve this problem by introducing variable assignments, 
which allow us to link variables directly with elements of the do- 
main. Then instead of saying that, e.g., 4x B(x) is satisfied in M 
iff for at least one m € |M|, we say it is satisfied in M relative to s 
iff B(x) is satisfied relative to s[m/x] for at least one m € |M|. 


Example 7.12. Let £ = {a,b, f,R} where a and 6 are constant 
symbols, f is a two-place function symbol, and R is a two-place 
predicate symbol. Consider the structure M defined by: 
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1. |M| = {1,2,3,4} 

og a4 

a pao 

4. f(x,y) =x+y if x+y < 3 and =3 otherwise. 
5. RM = {(1,1), (1,2), (2,3), (2,4)} 


The function s(x) = 1 that assigns 1 € |M| to every variable is a 
variable assignment for M. 
Then 


Vall" (f(a,b)) = f™ (Val! (a), Val}"(5)). 


Since a and 6 are constant symbols, ValM (a) = @4@ = 1 and 
ValM@(b) = M = 2. So 


Val4 (f(a,b)) = 7" 0,2) S149 =3. 
To compute the value of f(f(a,5),a) we have to consider 
Val! (f (f (a,b),a)) = fM@(ValM (f (a, 6)), Val@ (a)) = f!@ (3,1) = 3, 
since 3+ 1 > 3. Since s(x) = 1 and ValM(x) = s(x), we also have 


Val! (f (f (a,),x)) = fM (Val! (f(a, b)), Val! (x)) = $M (3,1) = 3, 


An atomic formula R(¢1, fg) is satisfied if the tuple of values of 
its arguments, i.e., (ValM (¢,), ValM (t2)), is an element of R™. So, 
e.g., we have M,s & R(b, f(a, b)) since (Val™ (5), ValM (f(a, b))) = 
(2,3) ¢ RM, but M,s ¥ R(x, f(a,b)) since (1,3) ¢ R“[s]. 

To determine if a non-atomic formula A is satisfied, you apply 
the clauses in the inductive definition that applies to the main con- 
nective. For instance, the main connective in R(a,a)— (R(b,x) V 


R(«x,b)) is the —, and 


M,s & R(a,a) > (R(b,x) V R(x, ))) iff 
M,s # R(a,a) or M,s © R(b,x) V R(x,d) 
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Since M,s & R(a,a) (because (1,1) €¢ R™) we can’t yet determine 
the answer and must first figure out if M,s — R(b,x) V R(x,b): 


M,s & R(b,x) V R(x, 5) iff 
M,s § R(b,x) or M,s § R(x, 5) 


And this is the case, since M,s § R(x,b) (because (1,2) ¢ R™). 


Recall that an x-variant of s is a variable assignment that 
differs from s at most in what it assigns to x. For every element 
of |M|, there is an x-variant of s: 


5, = s[1/x], 52 = s[2/x], 
53 = 5[3/x], 54 = S[4/x]. 


So, e.g., so(x) = 2 and s2(y) = s(y) = 1 for all variables y other 
than x. These are all the x-variants of s for the structure M, since 
|M| = {1,2,3,4}. Note, in particular, that 5; = s (s is always an 
x-variant of itself). 

To determine if an existentially quantified formula Sx A(x) is 
satisfied, we have to determine if M,s[m/x] & A(x) for at least 
one m € |M|. So, 


M,s & Ax (R(0,x) V R(x,))), 


since M,s[1/x] — R(b,x) V R(x,b) (s[3/x] would also fit the bill). 
But, 
M,s ¥ Ax (R(b,x) A R(x, 5)) 


since, whichever m € |M| we pick, M,s[m/x] # R(b,x) A R(x,)). 
To determine if a universally quantified formula Vx A(x) is 
satisfied, we have to determine if M,s[m/x] © A(x) for all m € 
|M|. So, 
M,s § Vx (R(x, a) > R(a,x)), 


since M,s[m/x] = R(x,a) — R(a,x) for all m € |M|. For m = 1, 
we have M,s[1/x] § R(a,x) so the consequent is true; for m = 2, 
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3, and 4, we have M,s[m/x] # R(x,a), so the antecedent is false. 
But, 
M,s # Vx (R(a,x) > R(x, a)) 


since M,s[2/x] ¥ R(a,x) > R(x,a) (because M,s[2/x] = R(a,x) 
and M,s[2/x] # R(x,a)). 


For a more complicated case, consider 
Vx (R(a,x) > Ay R(x,y)). 


Since M,s[3/x] # R(a,x) and M,s[4/x] ¥ R(a,x), the inter 
esting cases where we have to worry about the consequent of 
the conditional are only m = 1 and = 2. Does M,s[1/x] § 
Ay R(x,y) hold? It does if there is at least one n € |M| so 
that M,s[1/x][n/y] & R(x,y). In fact, if we take » = 1, we 
have s[1/x][n/y] = s[1/y] = s. Since s(x) = 1, s(y) = 1, and 
(1,1) ¢ R™, the answer is yes. 

To determine if M,s[2/x] & Ay R(x,y), we have to look 
at the variable assignments s[2/x][n/y]. Here, for n = 1, 
this assignment is sy = s[2/x], which does not satisfy R(x,y) 
(so(x) = 2, s(y) = 1, and (2,1) ¢ R™). However, consider 
s[2/x][3/y] = so[3/y]. M,s2[3/y] — R(x,y) since (2,3) € Re. 
and so M,s9 & Ay R(x, 9). 

So, for all 2 € |M|, either M,s[m/x] # R(a,x) (if m = 3, 4) or 
M,s[m/x] § dy R(x,y) (if m = 1, 2), and so 


M,s § Vx (R(a,x) > Ay R(x,y)). 
On the other hand, 
M,s # Ax (R(a,x) A Vy R(x,y)). 


We have M,s[m/x] & R(a,x) only for m = 1 and m = 2. But for 
both of these values of m, there is in turn an n € |M|, namely n = 
4, so that M,s[m/x][n/y] # R(x,y) and so M,s[m/x] # Vy R(x,y) 
for m = 1 and m = 2. In sum, there is no m € |M| such that 
M,s[m/x] & R(a,x) A Vy R(x,y). 
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7.5 Variable Assignments 


A variable assignment s provides a value for every variable—and 
there are infinitely many of them. This is of course not neces- 
sary. We require variable assignments to assign values to all vari- 
ables simply because it makes things a lot easier. The value of a 
term ¢, and whether or not a formula A is satisfied in a structure 
with respect to s, only depend on the assignments s makes to 
the variables in ¢ and the free variables of A. This is the content 
of the next two propositions. To make the idea of “depends on” 
precise, we show that any two variable assignments that agree on 
all the variables in ¢ give the same value, and that A is satisfied 
relative to one iff it is satisfied relative to the other if two variable 


assignments agree on all free variables of A. 


Proof: By induction on the complexity of ¢. For the base case, ¢ 
can be a constant symbol or one of the variables x, ..., xy. If 
= cthen Val (1) =e = Val). em, oe = 59(x;) 
by the hypothesis of the proposition, and so Val! (tf) = s1(%7) = 
52(x;) = Val (2). 

For the inductive step, assume that ¢ = f(4,...,é,) and that 
the claim holds for 4,..., ¢. Then 


Val (= Val Gf Giga) = 
= f™(ValM(a),...,ValM(&)) 


For j = 1, ..., &, the variables of ¢; are among xj, ..., X,. By 
induction hypothesis, ValM (tj) = Val! (¢;). So, 


Val! (t) = ValM(f(t,....tk)) = 
= f™ (Valh' (a)... Valy (&)) = 
=f" Val G)iixs, Vall (i) = 
Val Of est = Val, Oh Oo 
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Proof. We use induction on the complexity of A. For the base 
case, where A is atomic, A can be: 1, R(t4,...,¢;) for a k-place 
predicate R and terms 4, ..., ¢%, or 4) = fo for terms 4 and ¢p. 


1. A=1: both M,s, # A and M,5. £ A. 
2. A= R(h,...,t): let M,s, A. Then 
(Vall (t1),..., Val! (t)) € RM. 


For i =1,..., k, Mele 7 Val » (ti) by Proposition 7.13. 
So we ae. have (Val (4;),. Val! » (te)) € RM, 


3. d= t =: suppose M,s, § A. Then Val! (4) = Val! (2). 
So, 


Val! (t,) = Val! (41) (by Proposition 7.13) 
= Val! (t2) (since M, 51 — 4 = é2) 
= = ValM (¢2) (by Proposition 7.13), 


so M,s9 & = fo. 


Now assume M,s; — Biff M, so — B for all formulas B less com- 
plex than A. The induction step proceeds by cases determined by 
the main operator of A. In each case, we only demonstrate the 
forward direction of the biconditional; the proof of the reverse 
direction is symmetrical. In all cases except those for the quanti- 
fiers, we apply the induction hypothesis to sub-formulas B of A. 
The free variables of B are among those of A. Thus, if 5s; and sy 
agree on the free variables of A, they also agree on those of B, 
and the induction hypothesis applies to B. 


1. A=-78B: if M,s) § A, then M,s, # B, so by the induction 
hypothesis, M, s9 ¥ B, hence M,s9 § A. 
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2. A=BAC: exercise. 


3. A= BVC: if M,s & A, then M,s; — B or M,s, & C. By 
induction hypothesis, M, sy — B or M,s9 & C, so M, 59 & A. 


4. A=B—-C: exercise. 


5. A= J3xB: if M,s, & A, there is an m € |M| so that 
M,s\[m/x] © B. Let s| = [m/x] and s} = so[m/x]. The 
free variables of B are among xj, ..., X,, and x. sj(x;) = 
5,(4;), Since sj and 5s) are x-variants of 5; and 59, respec- 
tively, and by hypothesis s1(x;) = 59(%;). s}(”) = 5j(x) =m 
by the way we have defined s; and s;. Then the induction hy- 
pothesis applies to B and sj, 5}, so M,s;  B. Hence, since 
54 = S2[m/x], there is an m € |M| such that M, so[m/x] - B, 
and so M,s9 § A. 


6. A=Vx B: exercise. 


By induction, we get that M,s, § A iff M,s9 = A whenever the free 
variables in A are among xj,..., ¥, and 51(x;) = s9(x;) for i =1, 
ey Me oO 


Sentences have no free variables, so any two variable assign- 
ments assign the same things to all the (zero) free variables of any 
sentence. The proposition just proved then means that whether 
or not a sentence is satisfied in a structure relative to a variable 
assignment is completely independent of the assignment. We’ll 
record this fact. It justifies the definition of satisfaction of a sen- 
tence in a structure (without mentioning a variable assignment) 
that follows. 


Proof. Let s’ be any variable assignment. Since A is a sentence, it 
has no free variables, and so every variable assignment s’ trivially 
assigns the same things to all free variables of A as does s. So the 
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condition of Proposition 7.14 is satisfied, and we have M,s § A 
iff M,s’ & A. Oo 


Definition 7.16. If A is a sentence, we say that a structure M 


satisfies A, M & A, iff M,s & A for all variable assignments s. 


If M § A, we also simply say that A is true in M. 


Proof. Exercise. Oo 


Proof. Exercise. Oo 


7.6 Extensionality 


Extensionality, sometimes called relevance, can be expressed in- 
formally as follows: the only factors that bear upon the satisfac- 
tion of formula A in a structure M relative to a variable assign- 
ment s, are the size of the domain and the assignments made 
by M and s to the elements of the language that actually appear 
in A. 

One immediate consequence of extensionality is that where 
two structures M and M’ agree on all the elements of the lan- 
guage appearing in a sentence A and have the same domain, M 
and M’ must also agree on whether or not A itself is true. 


Proposition 7.19 (Extensionality). Let A be a formula, and M, 
and Mg be structures with |Mq| = |Mg|, and s a variable assignment 
on |My| = |Mp|. IfceM = cM2, RM = R™2, and f™ = f™ for every 
constant symbol c, relation symbol R, and function symbol f occurring 
in A, then M,,s § A iff Mo,s & A. 


Proof. First prove (by induction on ¢) that for every term, 
ValM1(¢) = Val”2(t). Then prove the proposition by induction 
on A, making use of the claim just proved for the induction basis 
(where A is atomic). Oo 


Corollary 7.20 (Extensionality for Sentences). Let A be a sen- 
tence and My, Mg as in Proposition 7.19. Then M, § A iff Mo & A. 


Proof. Follows from Proposition 7.19 by Corollary 7.15. Oo 


Moreover, the value of a term, and whether or not a structure 
satisfies a formula, only depend on the values of its subterms. 


Proposition 7.21. Let M be a structure, t and t’ terms, and s a 


variable assignment. Then ValM@ (¢[t’/x]) = Val vail (e) 2] (Z). 


Proof: By induction on ¢. 


1. If ¢ is a constant, say, ¢ = c, then ¢[t’/x] = c, and Val™ (c) = 
M _ M 
i Val vaiM (0) /x] (c). 

2. If ¢ is a variable other than x, say, ¢ = y, then ¢[t’/x] = y, 


and ValM (y) = Val aatinyjg@) since 5 ~, s[ValM (2) /x]. 


3. If t = x, then ¢[t’/x] = ¢’. But Val vate) /4] = ValM (¢’) 
by definition of s[ValM (¢’) /x]. 
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4. If t = f(h,...,¢,) then we have: 


Val! (z[t!/x]) = 
= Val! (fF (e)[’/x]..--staLe’/x))) 
by definition of ¢[¢’/x] 
= f™ (Vals (ald’/x]),.-. aa 
by definition of ValM@(f(.. 
M : 
=f (ValM s[Val™ (t’) /x jas ., Val s[ValM (2) )/x] (tn)) 
by induction hypothesis 


= Val iyi (t) by definition of ValM s[valM(4") x] CF leas)) 


Proof. Exercise. Oo 


The point of Propositions 7.21 and 7.22 is the following. Sup- 
pose we have a term ¢ or a formula A and some term ?#’, and we 
want to know the value of ¢[¢’/x] or whether or not A[t’ /x] is sat- 
isfied in a structure M relative to a variable assignment s. Then 
we can either perform the substitution first and then consider the 
value or satisfaction relative to M and s, or we can first deter- 
mine the value m = Val“ (¢’) of ¢’ in M relative to s, change the 
variable assignment to s[m/x] and then consider the value of ¢ 
in M and s[m/x], or whether M,s[m/x] & A. Propositions 7.21 
and 7.22 guarantee that the answer will be the same, whichever 
way we do it. 


7.7. Semantic Notions 


Given the definition of structures for first-order languages, we 
can define some basic semantic properties of and relationships 
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between sentences. The simplest of these is the notion of validity 
of a sentence. A sentence is valid if it is satisfied in every struc- 
ture. Valid sentences are those that are satisfied regardless of how 
the non-logical symbols in it are interpreted. Valid sentences are 
therefore also called logical truths—they are true, i.e., satisfied, in 
any structure and hence their truth depends only on the logical 
symbols occurring in them and their syntactic structure, but not 
on the non-logical symbols or their interpretation. 


Definition 7.23 (Validity). A sentence A is valid, + A, iff M + A 
for every structure M. 


Definition 7.24 (Entailment). A set of sentences I" entails a 
sentence A, I’ £ A, iff for every structure M with Mt IT, Me A. 


Definition 7.25 (Satisfiability). A set of sentences I is satisfi- 
able if M © I for some structure M. If I is not satisfiable it is 
called unsatisfiable. 


Proof: For the forward direction, let A be valid, and let [ be a 
set of sentences. Let M be a structure so that Me I. Since A is 
valid, M 5 A, hence IF A. 

For the contrapositive of the reverse direction, let A be in- 
valid, so there is a structure M with M ¢ A. When I’ = {T}, since 
T is valid, M & I’. Hence, there is a structure M so that Mt 
but M ¥ A, hence I does not entail A. Oo 


Proposition 7.27. [+ A iff U {7A} is unsatisfiable. 


Proof. For the forward direction, suppose [+ A and suppose to 
the contrary that there is a structure M so that M § IU {74}. 
Since Mt I andl + A, ME A. Also, since M & TU {7A}, Me 
=A, so we have both M £ A and M ¥ A, a contradiction. Hence, 
there can be no such structure M, so I’ U {=A} is unsatisfiable. 
For the reverse direction, suppose I’ U {=A} is unsatisfiable. 
So for every structure M, either M ¢ I or M § A. Hence, for 
every structure M with Mtl, Met A, sole A. Oo 


Proposition 7.28. [ff CI’ andI t A, then’ & A. 


Proof. Suppose that ! C I’ and + A. Let M bea structure such 
that M + I’; then M £ I, and since IF A, we get that M F A. 
Hence, whenever Mt I’, Mt A, sol’ & A. Oo 


Theorem 7.29 (Semantic Deduction Theorem). [.U{A} + B 
iff &A— B. 


Proof. For the forward direction, let [ U {A} & B and let M be 
a structure so that Me JT. If Met A, then MET U {4}, so since 
IU {A} entails B, we get M + B. Therefore, M + A — B, so 
rerA->B. 

For the reverse direction, let [ — A— Band M be a structure 
so that Me T U{A}. Then Met T,so Mt A- B, and since 
M + A, M & B. Hence, whenever M & IU {A}, M § B, so 
TU{A} eB. Oo 


Proposition 7.30. Let M be a structure, and A(x) a formula with 
one free variable x, and t a closed term. Then: 


71. A(t) & Ax A(x) 
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Proof. 1. Suppose M ¢ A(t). Let s be a variable assignment 
with s(x) = ValM@(t). Then M,s & A(t) since A(é) is 
a sentence. By Proposition 7.22, M,s § A(x). By Propo- 
sition 7.18, M & Ax A(x). 


2. Exercise. oO 


Summary 


The semantics for a first-order language is given by a structure 
for that language. It consists of a domain and elements of that 
domain are assigned to each constant symbol. Function symbols 
are interpreted by functions and relation symbols by relation on 
the domain. A function from the set of variables to the domain 
is a variable assignment. The relation of satisfaction relates 
structures, variable assignments and formulas; M,s § A is defined 
by induction on the structure of A. M,s & A only depends on 
the interpretation of the symbols actually occurring in A, and in 
particular does not depend on s if A contains no free variables. 
So if A is a sentence, M ¢ A if M,s & A for any (or all) s. 

The satisfaction relation is the basis for all semantic notions. 
A sentence is valid, + A, if it is satisfied in every structure. A 
sentence A is entailed by set of sentences I’, I A, iff M & A for 
all M which satisfy every sentence in J. A set I is satisfiable iff 
there is some structure that satisfies every sentence in J’, other- 
wise unsatisfiable. These notions are interrelated, e.g., [ & A iff 
I U {74} is unsatisfiable. 


Problems 


Problem 7.1. Is N, the standard model of arithmetic, covered? 
Explain. 
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Problem 7.2. Let £ = {c,f,A} with one constant symbol, one 
one-place function symbol and one two-place predicate symbol, 
and let the structure M be given by 


1. |M| = {1,2,3} 

2, M=3 

3. #1) = 2, (2) = 3, (3) = 2 
4. AM = {(1,2), (2,3), (3,3)} 


(a) Let s(v) = 1 for all variables v. Find out whether 


M,s § Ax (A(f (2), ¢) > Vy (AY, x) V A(f (YY), *))) 


Explain why or why not. 
(b) Give a different structure and variable assignment in 
which the formula is not satisfied. 


Problem 7.3. Complete the proof of Proposition 7.14. 
Problem 7.4. Prove Proposition 7.17 
Problem 7.5. Prove Proposition 7.18. 


Problem 7.6. Suppose & is a language without function sym- 
bols. Given a structure M, ¢ a constant symbol and a ¢€ |M|, 
define M[a/c] to be the structure that is just like M, except that 
cMla/c] — gq. Define M | A for sentences A by: 


1. A=1: notM|E A. 

2, A= R(dj,...,d,): M|- A iff (dM,...,dM) & RM. 
3. A=d=d: M | Aiff dM = dM. 

4. A= 7B: M |F Aiff not M |F B. 

5- A=(BAC): M|FAiff M |E BandM |F C. 
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6. A=(BVC): M|E A iff M|E Bor M|EC (or both). 
7. A=(B->C): M|F A iff not M |E B or M |E C (or both). 
8. A=VxB: M | A iff for all a € |M|, M[a/c] |E Blc/x], if 


c does not occur in B. 


g. A = dx B: M | A iff there is an a € |M| such that 
M[a/c] |E Blc/x], if ¢ does not occur in B. 


Let x1, ..., X» be all free variables in A, cj, ..., ¢n constant sym- 
bols not in A, aj, ..., @, € |M|, and s(x;) = a;. 

Show that M,s & A iff M[a./cq,...,4n/¢n| JE 
A[ey/x1] ..- [¢n/ xn]. 

(This problem shows that it is possible to give a semantics for 
first-order logic that makes do without variable assignments.) 


Problem 7.7. Suppose that f is a function symbol not in A(x, y). 
Show that there is a structure M such that M & Vx Ay A(x,y) iff 
there is an M’ such that M’ & Vx A(x, f(x)). 

(This problem is a special case of what’s known as Skolem’s 
Theorem; Vx A(x,f(x)) is called a Skolem normal form of 
Vx dy A(x,y).) 


Problem 7.8. Carry out the proof of Proposition 7.19 in detail. 
Problem 7.9. Prove Proposition 7.22 


Problem 7.10. 1. Show that /' r 1 iff © is unsatisfiable. 
2. Show that FU {A} & 1 iff fe AA. 


3. Suppose c does not occur in A or I’. Show that I’ & Vx A iff 
Tt A[e/x]. 


Problem 7.11. Complete the proof of Proposition 7.30. 


Theories and 
Their Models 


8.1 Introduction 


The development of the axiomatic method is a significant 
achievement in the history of science, and is of special impor- 
tance in the history of mathematics. An axiomatic development 
of a field involves the clarification of many questions: What is the 
field about? What are the most fundamental concepts? How are 
they related? Can all the concepts of the field be defined in terms 
of these fundamental concepts? What laws do, and must, these 
concepts obey? 

The axiomatic method and logic were made for each other. 
Formal logic provides the tools for formulating axiomatic theo- 
ries, for proving theorems from the axioms of the theory in a 
precisely specified way, for studying the properties of all systems 
satisfying the axioms in a systematic way. 


Definition 8.1. A set of sentences I is closed iff, whenever [ - A 
then A € I’. The closure of a set of sentences I is {A:T F A}. 


We say that I’ is axiomatized by a set of sentences J if I’ is the 
closure of /. 
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We can think of an axiomatic theory as the set of sentences 
that is axiomatized by its set of axioms 4. In other words, when 
we have a first-order language which contains non-logical sym- 
bols for the primitives of the axiomatically developed science we 
wish to study, together with a set of sentences that express the 
fundamental laws of the science, we can think of the theory as 
represented by all the sentences in this language that are entailed 
by the axioms. This ranges from simple examples with only a 
single primitive and simple axioms, such as the theory of partial 
orders, to complex theories such as Newtonian mechanics. 

The important logical facts that make this formal approach 
to the axiomatic method so important are the following. Suppose 
I is an axiom system for a theory, i.e., a set of sentences. 


1. We can state precisely when an axiom system captures an 
intended class of structures. That is, if we are interested 
in a certain class of structures, we will successfully capture 
that class by an axiom system I iff the structures are exactly 
those M such that Me TI. 


2. We may fail in this respect because there are M such that 
MrT, but M is not one of the structures we intend. This 
may lead us to add axioms which are not true in M. 


3. If we are successful at least in the respect that I is true 
in all the intended structures, then a sentence A is true in 
all intended structures whenever [ - A. Thus we can use 
logical tools (such as derivation methods) to show that sen- 
tences are true in all intended structures simply by showing 
that they are entailed by the axioms. 


4. Sometimes we don’t have intended structures in mind, but 
instead start from the axioms themselves: we begin with 
some primitives that we want to satisfy certain laws which 
we codify in an axiom system. One thing that we would 
like to verify right away is that the axioms do not contradict 
each other: if they do, there can be no concepts that obey 
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these laws, and we have tried to set up an incoherent theory. 
We can verify that this doesn’t happen by finding a model 
of I. And if there are models of our theory, we can use 
logical methods to investigate them, and we can also use 
logical methods to construct models. 


5. The independence of the axioms is likewise an important 
question. It may happen that one of the axioms is actu- 
ally a consequence of the others, and so is redundant. We 
can prove that an axiom A in I is redundant by proving 
I \ {A} & A. We can also prove that an axiom is not redun- 
dant by showing that (I” \ {A}) U {A} is satisfiable. For 
instance, this is how it was shown that the parallel postulate 
is independent of the other axioms of geometry. 


6. Another important question is that of definability of con- 
cepts in a theory: The choice of the language determines 
what the models of a theory consist of. But not every aspect 
of a theory must be represented separately in its models. 
For instance, every ordering < determines a corresponding 
strict ordering <—given one, we can define the other. So it 
is not necessary that a model of a theory involving such an 
order must also contain the corresponding strict ordering. 
When is it the case, in general, that one relation can be 
defined in terms of others? When is it impossible to define 
a relation in terms of others (and hence must add it to the 
primitives of the language)? 


8.2 Expressing Properties of Structures 


It is often useful and important to express conditions on func- 
tions and relations, or more generally, that the functions and re- 
lations in a structure satisfy these conditions. For instance, we 
would like to have ways of distinguishing those structures for a 
language which “capture” what we want the predicate symbols 
to “mean” from those that do not. Of course we’re completely 
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free to specify which structures we “intend,” e.g., we can specify 
that the interpretation of the predicate symbol < must be an or- 
dering, or that we are only interested in interpretations of & in 
which the domain consists of sets and € is interpreted by the “is 
an element of” relation. But can we do this with sentences of the 
language? In other words, which conditions on a structure M can 
we express by a sentence (or perhaps a set of sentences) in the 
language of M? There are some conditions that we will not be 
able to express. For instance, there is no sentence of Ly which is 
only true in a structure M if |M| = N. We cannot express “the do- 
main contains only natural numbers.” But there are “structural 
properties” of structures that we perhaps can express. Which 
properties of structures can we express by sentences? Or, to put 
it another way, which collections of structures can we describe as 
those making a sentence (or set of sentences) true? 


Definition 8.2 (Model of a set). Let J be a set of sentences in 


a language &. We say that a structure M is a model of T if Mr A 
for all Ac I. 


Example 8.3. The sentence Vxx < x is true in M iff <™ isa 
reflexive relation. The sentence Vx Vy ((x < yA y < x) > x =y) is 
true in M iff <™ is antisymmetric. The sentence Vx Vy Vz ((x < 
yAy <2) x < 2) is true inM iff <™ is transitive. Thus, the 


models of 


{ Vex <x, 
VaWy((x Sy AY Sx) > x=), 
VxVyVz((xsyAysz)7x<z) } 
are exactly those structures in which <™ is reflexive, anti- 


symmetric, and transitive, i.e., a partial order. Hence, we can 
take them as axioms for the first-order theory of partial orders. 
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8.3 Examples of First-Order Theories 


Example 8.4. The theory of strict linear orders in the lan- 
guage &. is axiomatized by the set 


{ VWxrax <x, 
VxVy((x<yVy <x) Vx=y), 
VaVeVe((x <pAy <2) x <2) } 
It completely captures the intended structures: every strict linear 
order is a model of this axiom system, and vice versa, if R is a 


linear order on a set X, then the structure M with |M| = X and 
<M = R is a model of this theory. 


Example 8.5. The theory of groups in the language 1 (constant 
symbol), - (two-place function symbol) is axiomatized by 


Vx(x-1)=x 
Vx Vy Wz (x: (y+ 2z)) = ((%-y) +2) 
Vx dy (x+y) =1 


Example 8.6. The theory of Peano arithmetic is axiomatized by 
the following sentences in the language of arithmetic Ly. 


Vx Vy (x’ = 9’ > x = 9) 
Veoee 

Vx(x+0)=x 

Va Vy (x+y') = (x+y) 

Vx (x x0) =0 

Va Vy (x xy’) = (xX y) +x) 
Vx Vy (x < yo Az (z2’ +x) =y) 


plus all sentences of the form 


(A(0) A Vx (A(x) 3 A(x’))) 3 Vx A(x) 
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Since there are infinitely many sentences of the latter form, this 
axiom system is infinite. The latter form is called the induction 
schema. (Actually, the induction schema is a bit more complicated 
than we let on here.) 

The last axiom is an explicit definition of <. 


Example 8.7. The theory of pure sets plays an important role 
in the foundations (and in the philosophy) of mathematics. A set 
is pure if all its elements are also pure sets. The empty set counts 
therefore as pure, but a set that has something as an element that 
is not a set would not be pure. So the pure sets are those that are 
formed just from the empty set and no “urelements,” i.e., objects 
that are not themselves sets. 

The following might be considered as an axiom system for a 
theory of pure sets: 


Ax ayy ex 

VaxVy (Wz(z exozey)rx=y) 
VaxVydzVu(uezo(u=xVu=y)) 
Vx dyVz(z eyo du(zeuAuex)) 


plus all sentences of the form 


dx Vy (ye xe A(y)) 


The first axiom says that there is a set with no elements (i.e., 0 
exists); the second says that sets are extensional; the third that 
for any sets X and Y, the set {X,Y} exists; the fourth that for 
any set X, the set UX exists, where UX is the union of all the 
elements of X. 

The sentences mentioned last are collectively called the naive 
comprehension scheme. It essentially says that for every A(x), the 
set {x : A(x)} exists—so at first glance a true, useful, and perhaps 
even necessary axiom. It is called “naive” because, as it turns out, 
it makes this theory unsatisfiable: if you take A(y) to be -y € y, 
you get the sentence 


dxVy (yexon-7yey) 
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and this sentence is not satisfied in any structure. 


Example 8.8. In the area of mereology, the relation of parthood is 
a fundamental relation. Just like theories of sets, there are theo- 
ries of parthood that axiomatize various conceptions (sometimes 
conflicting) of this relation. 

The language of mereology contains a single two-place pred- 
icate symbol P, and P(x,y) “means” that x is a part of y. When 
we have this interpretation in mind, a structure for this language 
is called a parthood structure. Of course, not every structure for a 
single two-place predicate will really deserve this name. To have 
a chance of capturing “parthood,” P™ must satisfy some condi- 
tions, which we can lay down as axioms for a theory of parthood. 
For instance, parthood is a partial order on objects: every object 
is a part (albeit an improper part) of itself; no two different objects 
can be parts of each other; a part of a part of an object is itself 
part of that object. Note that in this sense “is a part of” resembles 
“is a subset of,” but does not resemble “is an element of” which 
is neither reflexive nor transitive. 


Vx P(x,x) 


Wx Vy ((P(x%,9) A PY, *)) > * = y) 
VaVy Ve (P49) A PU2)) @ PLR) 


Moreover, any two objects have a mereological sum (an object 
that has these two objects as parts, and is minimal in this respect). 


Vx Vy dz Vu (P(z,u) @ (P(x, u) A P(y,u))) 


These are only some of the basic principles of parthood con- 
sidered by metaphysicians. Further principles, however, quickly 
become hard to formulate or write down without first introducing 
some defined relations. For instance, most metaphysicians inter- 
ested in mereology also view the following as a valid principle: 
whenever an object x has a proper part y, it also has a part z that 
has no parts in common with y, and so that the fusion of y and 
2 is %. 
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8.4 Expressing Relations in a Structure 


One main use formulas can be put to is to express properties and 
relations in a structure M in terms of the primitives of the lan- 
guage & of M. By this we mean the following: the domain of M 
is a set of objects. The constant symbols, function symbols, and 
predicate symbols are interpreted in M by some objects in|M|, 
functions on |M|, and relations on |M]. For instance, if AS is in 


L, then M assigns to it a relation R = AM Then the formula 
A5(v1; V2) expresses that very relation, in the following sense: if a 
variable assignment s maps vj to a € |M| and v to d € |M|, then 


Rab iff M,s¢ A’(u,w). 


Note that we have to involve variable assignments here: we can’t 
just say “Rab iff MF As (a,b)” because a and b are not symbols 
of our language: they are elements of |M|. 

Since we don’t just have atomic formulas, but can combine 
them using the logical connectives and the quantifiers, more com- 
plex formulas can define other relations which aren’t directly built 
into M. We're interested in how to do that, and specifically, which 
relations we can define in a structure. 


Definition 8.9. Let A(y,...,V,) be a formula of & in which only 
Vj... Vn occur free, and let M be a structure for L%. A(y,..., Vn) 
expresses the relation R C |M|" iff 


Ray...d, iff M,st A(\4,...,Vn) 


for any variable assignment s with s(v;) = a; (i =1,...,n). 


Example 8.10. In the standard model of arithmetic N, the for- 
mula vy1 < v2 V WY = Ve expresses the < relation on N. The 


formula vy = v; expresses the successor relation, i.e., the relation 


R C N? where Ram holds if m is the successor of n. The for- 


mula v = v, expresses the predecessor relation. The formulas 


Av3 (v3 #OA vo = (y+ v3)) and Av3 (vy + v3’) = vg both express 
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the < relation. This means that the predicate symbol < is actually 
superfluous in the language of arithmetic; it can be defined. 


This idea is not just interesting in specific structures, but gen- 
erally whenever we use a language to describe an intended model 
or models, i.e., when we consider theories. These theories often 
only contain a few predicate symbols as basic symbols, but in the 
domain they are used to describe often many other relations play 
an important role. If these other relations can be systematically 
expressed by the relations that interpret the basic predicate sym- 
bols of the language, we say we can define them in the language. 


8.5 The Theory of Sets 


Almost all of mathematics can be developed in the theory of 
sets. Developing mathematics in this theory involves a number 
of things. First, it requires a set of axioms for the relation ¢. A 
number of different axiom systems have been developed, some- 
times with conflicting properties of ¢. The axiom system known 
as ZFC, Zermelo-Fraenkel set theory with the axiom of choice 
stands out: it is by far the most widely used and studied, because 
it turns out that its axioms suffice to prove almost all the things 
mathematicians expect to be able to prove. But before that can 
be established, it first is necessary to make clear how we can even 
express all the things mathematicians would like to express. For 
starters, the language contains no constant symbols or function 
symbols, so it seems at first glance unclear that we can talk about 
particular sets (such as @ or N), can talk about operations on sets 
(such as X UY and g(X)), let alone other constructions which 
involve things other than sets, such as relations and functions. 
To begin with, “is an element of” is not the only relation we 
are interested in: “is a subset of” seems almost as important. But 
we can define “is a subset of” in terms of “is an element of.” To 
do this, we have to find a formula A(x,y) in the language of set 
theory which is satisfied by a pair of sets (X,Y) iff X C Y. But X 
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is a subset of Y just in case all elements of X are also elements 
of Y. So we can define C by the formula 


Ve(zex> 2 € 9) 


Now, whenever we want to use the relation C in a formula, we 
could instead use that formula (with x and y suitably replaced, 
and the bound variable z renamed if necessary). For instance, 
extensionality of sets means that if any sets x and y are contained 
in each other, then x and y must be the same set. This can be 
expressed by Vx Vy ((x Cy Ay C x) > x =), or, if we replace C 
by the above definition, by 


VaVy (Vz (Z Ex rzEeyAVz(Z EY FZEXK))OK=Y). 


This is in fact one of the axioms of ZFC, the “axiom of exten- 
sionality.” 

There is no constant symbol for 0, but we can express “x 
is empty” by ~dyy € x. Then “@ exists” becomes the sen- 
tence 4x =dy y € x. This is another axiom of ZFC. (Note that 
the axiom of extensionality implies that there is only one empty 
set.) Whenever we want to talk about @ in the language of set 
theory, we would write this as “there is a set that’s empty and 
...” As an example, to express the fact that @ is a subset of every 
set, we could write 


ag(mayye eA Vex E 2) 


where, of course, x € z would in turn have to be replaced by its 
definition. 

To talk about operations on sets, such as X UY and g(X), 
we have to use a similar trick. There are no function symbols 
in the language of set theory, but we can express the functional 
relations X UY = Z and g(X) = Y by 


Vu((wexVuey)ouez) 
Vu(uCxouey) 
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since the elements of X U Y are exactly the sets that are either 
elements of X or elements of Y, and the elements of g(X) are 
exactly the subsets of X. However, this doesn’t allow us to use 
x Uy or g(x) as if they were terms: we can only use the entire 
formulas that define the relations X UY = Z and g(X) = Y. In 
fact, we do not know that these relations are ever satisfied, i.e., 
we do not know that unions and power sets always exist. For 
instance, the sentence Vx dy g(x) = y is another axiom of ZFC 
(the power set axiom). 

Now what about talk of ordered pairs or functions? Here we 
have to explain how we can think of ordered pairs and functions 
as special kinds of sets. One way to define the ordered pair (x, y) 
is as the set {{x},{x,y}}. But like before, we cannot introduce 
a function symbol that names this set; we can only define the 


relation (x,y) =z, ie., {{x}, {x, y}} = z: 
Vu(ueze(Woveucve=x)VVu(veuc(v=xVv=y)))) 


This says that the elements u of z are exactly those sets which 
either have x as its only element or have x and y as its only 
elements (in other words, those sets that are either identical to 
{x} or identical to {x,y}). Once we have this, we can say further 
things, e.g., that X x Y = Z: 


Vze(zEeZordxay(xe XAVEY A (x,y) = z)) 


A function f: X — Y canbe thought of as the relation f(x) = 
y, ie., as the set of pairs {(x,y) : f(x) = y}. We can then say that 
a set f is a function from X to Y if (a) it is a relation C X¥ x Y, 
(b) it is total, ie., for all x ¢ X there is some y € Y such that 
(x,y) € f and (c) it is functional, i.e., whenever (x,y), (x,y’) € f, 
y = y’ (because values of functions must be unique). So “f is a 
function from X to Y” can be written as: 


Vu(ue fo Axay(xe X AVY EY A(x,y) =Uu))A 
Vx (x € X > (Ay (y € Y Amaps(f,x,y)) A 
(Vy Vy’ ((maps(f,x,y) A maps(f,x,y’)) > y = y’))) 
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where maps(/,x,y) abbreviates du (v € f A (x,y) = v) (this for 
mula expresses “f(x) = y”). 

It is now also not hard to express that f: X — Y is injective, 
for instance: 


fi: X PY AVKVK (xe X AX EXA 
Ay (maps(f,x,y) A maps(f,x’,y))) > x« = x’) 


A function f: X — Y is injective iff, whenever f maps x,x’ ¢ X 
to a single y, x = x’. If we abbreviate this formula as inj(/, X,Y), 
we're already in a position to state in the language of set theory 
something as non-trivial as Cantor’s theorem: there is no injective 
function from g(X) to X: 


VX VY (p(X) = ¥ > -3f inj(f,¥.X)) 


One might think that set theory requires another axiom that 
guarantees the existence of a set for every defining property. If 
A(x) is a formula of set theory with the variable x free, we can 
consider the sentence 


Ay Vx (x € y @ A(a)). 


This sentence states that there is a set y whose elements are all 
and only those x that satisfy A(x). This schema is called the 
“comprehension principle.” It looks very useful; unfortunately it 
is inconsistent. Take A(x) = 7x € x, then the comprehension 
principle states 

AyVx(x EyoxEx), 


i.e., it states the existence of a set of all sets that are not elements 
of themselves. No such set can exist—this is Russell’s Paradox. 
ZFC, in fact, contains a restricted—and consistent—version of 
this principle, the separation principle: 


VzdyVx (x eyo (xe zAA(x)). 


8.6 Expressing the Size of Structures 


There are some properties of structures we can express even with- 
out using the non-logical symbols of a language. For instance, 
there are sentences which are true in a structure iff the domain of 
the structure has at least, at most, or exactly a certain number n 
of elements. 


Proposition 8.11. The sentence 
Alon = SEA SD ooo alen 


(x1 # xg AX, # XZ AK HEN AX] # Xn A 
XQ FRA Ko F KAN ++ AX F Xn A 


Need FN ny) 


is true in a structure M iff |M| contains at least n elements. Conse- 
quently, M & ~Asns1 iff |M| contains at most n elements. 


Proposition 8.12. The sentence 


AR = Ax alvo ely 
(Goi 53 an A G1 GA OR A BSI ES Be A208 A a a2 a A 
Xo FX3 A Ko FAN: ++ AX F Xn A 


Melle neN 
Vy Y= #1 V-++ VY = Xn) 


is true in a structure M iff |M| contains exactly n elements. 
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There is no single purely logical sentence which is true in M iff 
|M| is infinite. However, one can give sentences with non-logical 
predicate symbols which only have infinite models (although not 
every infinite structure is a model of them). The property of being 
a finite structure, and the property of being a uncountable struc- 
ture cannot even be expressed with an infinite set of sentences. 
These facts follow from the compactness and Léwenheim-Skolem 
theorems. 


Summary 


Sets of sentences in a sense describe the structures in which they 
are jointly true; these structures are their models. Conversely, if 
we start with a structure or set of structures, we might be inter- 
ested in the set of sentences they are models of, this is the theory 
of the structure or set of structures. Any such set of sentences has 
the property that every sentence entailed by them is already in 
the set; they are closed. More generally, we call a set [a theory 
if it is closed under entailment, and say I’ is axiomatized by 4 
is I’ consists of all sentences entailed by 4. 

Mathematics yields many examples of theories, e.g., the the- 
ories of linear orders, of groups, or theories of arithmetic, e.g., 
the theory axiomatized by Peano’s axioms. But there are many 
examples of important theories in other disciplines as well, e.g., 
relational databases may be thought of as theories, and meta- 
physics concerns itself with theories of parthood which can be 
axiomatized. 

One significant question when setting up a theory for study is 
whether its language is expressive enough to allow us to formu- 
late everything we want the theory to talk about, and another is 
whether it is strong enough to prove what we want it to prove. To 
express a relation we need a formula with the requisite number 
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of free variables. In set theory, we only have € as a relation sym- 
bol, but it allows us to express x C y using Vu(u € x > u € y). 
Zermelo-Fraenkel set theory ZFC, in fact, is strong enough to 
both express (almost) every mathematical claim and to (almost) 
prove every mathematical theorem using a handful of axioms and 
a chain of increasingly complicated definitions such as that of C. 


Problems 


Problem 8.1. Find formulas in Ly which define the following 
relations: 


1. n is between i and /; 
2. n evenly divides m (i.e., m is a multiple of 7); 


3. n is a prime number (i.e., no number other than 1 and n 
evenly divides n). 


Problem 8.2. Suppose the formula A(y, v2) expresses the rela- 
tion R C |M|? in a structure M. Find formulas that express the 
following relations: 


1. the inverse R7! of R; 
2. the relative product R | R; 


Can you find a way to express R*, the transitive closure of R? 


Problem 8.3. Let & be the language containing a 2-place predi- 
cate symbol < only (no other constant symbols, function symbols 
or predicate symbols— except of course =). Let N be the struc- 
ture such that |N| = N, and <“ = {(n,m) : n < m}. Prove the 
following: 


1. {0} is definable in N; 


2. {1} is definable in N; 


CHAPTER 8. THEORIES AND THEIR MODELS 145 


3. {2} is definable in N; 
4. for each n €N, the set {n} is definable in N; 
5. every finite subset of |N| is definable in N; 


6. every co-finite subset of |N| is definable in N (where X C N 
is co-finite iff N \ X is finite). 


Problem 8.4. Show that the comprehension principle is incon- 
sistent by giving a derivation that shows 


AyVx (x e€yoxEx el. 


It may help to first show (A > =A) A (AA— A) LL. 


CHAPTER 9 


Derivation 
Systems 


g.1 Introduction 


Logics commonly have both a semantics and a derivation system. 
The semantics concerns concepts such as truth, satisfiability, va- 
lidity, and entailment. The purpose of derivation systems is to 
provide a purely syntactic method of establishing entailment and 
validity. They are purely syntactic in the sense that a derivation 
in such a system is a finite syntactic object, usually a sequence 
(or other finite arrangement) of sentences or formulas. Good 
derivation systems have the property that any given sequence or 
arrangement of sentences or formulas can be verified mechani- 
cally to be “correct.” 

The simplest (and historically first) derivation systems for 
first-order logic were axiomatic. A sequence of formulas counts 
as a derivation in such a system if each individual formula in it 
is either among a fixed set of “axioms” or follows from formulas 
coming before it in the sequence by one of a fixed number of “in- 
ference rules’—and it can be mechanically verified if a formula 
is an axiom and whether it follows correctly from other formulas 
by one of the inference rules. Axiomatic derivation systems are 
easy to describe—and also easy to handle meta-theoretically— 
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but derivations in them are hard to read and understand, and 
are also hard to produce. 

Other derivation systems have been developed with the aim 
of making it easier to construct derivations or easier to under- 
stand derivations once they are complete. Examples are natural 
deduction, truth trees, also known as tableaux proofs, and the se- 
quent calculus. Some derivation systems are designed especially 
with mechanization in mind, e.g., the resolution method is easy 
to implement in software (but its derivations are essentially im- 
possible to understand). Most of these other derivation systems 
represent derivations as trees of formulas rather than sequences. 
This makes it easier to see which parts of a derivation depend on 
which other parts. 

So for a given logic, such as first-order logic, the different 
derivation systems will give different explications of what it is for 
a sentence to be a theorem and what it means for a sentence to be 
derivable from some others. However that is done (via axiomatic 
derivations, natural deductions, sequent derivations, truth trees, 
resolution refutations), we want these relations to match the se- 
mantic notions of validity and entailment. Let’s write + A for “A is 
a theorem” and “/’ + A” for “A is derivable from I.” However 
+ is defined, we want it to match up with F, that is: 


1. + Aif and only if + A 
2. [+ Aif and only if + A 


The “only if” direction of the above is called soundness. A deriva- 
tion system is sound if derivability guarantees entailment (or va- 
lidity). Every decent derivation system has to be sound; unsound 
derivation systems are not useful at all. After all, the entire pur 
pose of a derivation is to provide a syntactic guarantee of validity 
or entailment. We’ll prove soundness for the derivation systems 
we present. 

The converse “if” direction is also important: it is called com- 
pleteness. A complete derivation system is strong enough to show 
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that A is a theorem whenever A is valid, and that [ + A when- 
ever [+ A. Completeness is harder to establish, and some logics 
have no complete derivation systems. First-order logic does. Kurt 
Gédel was the first one to prove completeness for a derivation 
system of first-order logic in his 1929 dissertation. 

Another concept that is connected to derivation systems is 
that of consistency. A set of sentences is called inconsistent if any- 
thing whatsoever can be derived from it, and consistent other- 
wise. Inconsistency is the syntactic counterpart to unsatisfiablity: 
like unsatisfiable sets, inconsistent sets of sentences do not make 
good theories, they are defective in a fundamental way. Consis- 
tent sets of sentences may not be true or useful, but at least they 
pass that minimal threshold of logical usefulness. For different 
derivation systems the specific definition of consistency of sets of 
sentences might differ, but like +, we want consistency to coincide 
with its semantic counterpart, satisfiability. We want it to always 
be the case that I is consistent if and only if it is satisfiable. Here, 
the “if” direction amounts to completeness (consistency guaran- 
tees satisfiability), and the “only if” direction amounts to sound- 
ness (satisfiability guarantees consistency). In fact, for classical 
first-order logic, the two versions of soundness and completeness 
are equivalent. 


9.2 The Sequent Calculus 


While many derivation systems operate with arrangements of sen- 
tences, the sequent calculus operates with sequents. A sequent is 
an expression of the form 


Ajiyc soo Agy => B,,...,Bmn, 


that is a pair of sequences of sentences, separated by the sequent 
symbol =. Either sequence may be empty. A derivation in the se- 
quent calculus is a tree of sequents, where the topmost sequents 
are of a special form (they are called “initial sequents” or “ax- 
ioms”) and every other sequent follows from the sequents imme- 
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diately above it by one of the rules of inference. The rules of in- 
ference either manipulate the sentences in the sequents (adding, 
removing, or rearranging them on either the left or the right), or 
they introduce a complex formula in the conclusion of the rule. 
For instance, the AL rule allows the inference from A,’ => A to 
ANAB,I = A, and the —>R allows the inference from 4, => 4,B 
tol = A,A— B, for any I’, A, A, and B. (In particular, [ and 4 
may be empty.) 

The + relation based on the sequent calculus is defined as 
follows: [+ A iff there is some sequence Jy such that every A in 
Io isin I and there is a derivation with the sequent [j = A at its 
root. A is a theorem in the sequent calculus if the sequent > A 
has a derivation. For instance, here is a derivation that shows 
that + (AA B) > A: 


A>A 
Ce ao Gee 
= (AAB)>A 


—R 


A set I is inconsistent in the sequent calculus if there is 
a derivation of I) = (where every A € [9 is in I and the right 
side of the sequent is empty). Using the rule WR, any sentence 
can be derived from an inconsistent set. 

The sequent calculus was invented in the 1930s by Gerhard 
Gentzen. Because of its systematic and symmetric design, it is 
a very useful formalism for developing a theory of derivations. 
It is relatively easy to find derivations in the sequent calculus, 
but these derivations are often hard to read and their connection 
to proofs are sometimes not easy to see. It has proved to be a 
very elegant approach to derivation systems, however, and many 
logics have sequent calculus systems. 


9.3 Natural Deduction 


Natural deduction is a derivation system intended to mirror ac- 
tual reasoning (especially the kind of regimented reasoning em- 
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ployed by mathematicians). Actual reasoning proceeds by a num- 
ber of “natural” patterns. For instance, proof by cases allows us 
to establish a conclusion on the basis of a disjunctive premise, 
by establishing that the conclusion follows from either of the dis- 
juncts. Indirect proof allows us to establish a conclusion by show- 
ing that its negation leads to a contradiction. Conditional proof 
establishes a conditional claim “if ...then ...” by showing that 
the consequent follows from the antecedent. Natural deduction 
is a formalization of some of these natural inferences. Each of 
the logical connectives and quantifiers comes with two rules, an 
introduction and an elimination rule, and they each correspond 
to one such natural inference pattern. For instance, Intro cor- 
responds to conditional proof, and VElim to proof by cases. A 
particularly simple rule is \Elim which allows the inference from 
ANA Bto A (or B). 

One feature that distinguishes natural deduction from other 
derivation systems is its use of assumptions. A derivation in nat- 
ural deduction is a tree of formulas. A single formula stands 
at the root of the tree of formulas, and the “leaves” of the tree 
are formulas from which the conclusion is derived. In natural 
deduction, some leaf formulas play a role inside the derivation 
but are “used up” by the time the derivation reaches the conclu- 
sion. This corresponds to the practice, in actual reasoning, of 
introducing hypotheses which only remain in effect for a short 
while. For instance, in a proof by cases, we assume the truth of 
each of the disjuncts; in conditional proof, we assume the truth 
of the antecedent; in indirect proof, we assume the truth of the 
negation of the conclusion. This way of introducing hypotheti- 
cal assumptions and then doing away with them in the service of 
establishing an intermediate step is a hallmark of natural deduc- 
tion. The formulas at the leaves of a natural deduction derivation 
are called assumptions, and some of the rules of inference may 
“discharge” them. For instance, if we have a derivation of B from 
some assumptions which include A, then the —Intro rule allows 
us to infer A — B and discharge any assumption of the form A. 
(To keep track of which assumptions are discharged at which in- 
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ferences, we label the inference and the assumptions it discharges 
with a number.) The assumptions that remain undischarged at 
the end of the derivation are together sufficient for the truth of the 
conclusion, and so a derivation establishes that its undischarged 
assumptions entail its conclusion. 

The relation [+ A based on natural deduction holds iff there 
is a derivation in which A is the last sentence in the tree, and every 
leaf which is undischarged is in J’. A is a theorem in natural de- 
duction iff there is a derivation in which A is the last sentence and 
all assumptions are discharged. For instance, here is a derivation 
that shows that + (A A B) > A: 


[AA Bt 


fine 
(ANB) A 


AElim 
—Intro 


The label 1 indicates that the assumption A A B is discharged at 
the —Intro inference. 

A set I’ is inconsistent iff 7 + 1 in natural deduction. The 
rule 1; makes it so that from an inconsistent set, any sentence 
can be derived. 

Natural deduction systems were developed by Gerhard 
Gentzen and Stanislaw Jaskowski in the 1930s, and later devel- 
oped by Dag Prawitz and Frederic Fitch. Because its inferences 
mirror natural methods of proof, it is favored by philosophers. 
The versions developed by Fitch are often used in introductory 
logic textbooks. In the philosophy of logic, the rules of natural 
deduction have sometimes been taken to give the meanings of 
the logical operators (“proof-theoretic semantics”). 


9.4 Tableaux 


While many derivation systems operate with arrangements of sen- 
tences, tableaux operate with signed formulas. A signed formula 
is a pair consisting of a truth value sign (T or F) and a sentence 


TA or FA. 
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A tableau consists of signed formulas arranged in a downward- 
branching tree. It begins with a number of assumptions and con- 
tinues with signed formulas which result from one of the signed 
formulas above it by applying one of the rules of inference. Each 
rule allows us to add one or more signed formulas to the end 
of a branch, or two signed formulas side by side—in this case a 
branch splits into two, with the two added signed formulas form- 
ing the ends of the two branches. 

A rule applied to a complex signed formula results in the 
addition of signed formulas which are immediate sub-formulas. 
They come in pairs, one rule for each of the two signs. For in- 
stance, the AT rule applies to T A A B, and allows the addition 
of both the two signed formulas T A and T B to the end of any 
branch containing T A A B, and the rule A A BF allows a branch 
to be split by adding F A and F B side-by-side. A tableau is closed 
if every one of its branches contains a matching pair of signed 
formulas T A and F A. 

The t relation based on tableaux is defined as follows: [+ A 
iff there is some finite set J) = {B),...,B,} C I such that there 
is a closed tableau for the assumptions 


{F A,T By,...,T By} 


For instance, here is a closed tableau that shows that + (AAB)—A: 


1. F(AA B)-A Assumption 
Q. TAAB —>F1 
3. FA —F1 
4. TA —T2 
5. TB —T2 
® 


A set I is inconsistent in the tableau calculus if there is a 
closed tableau for assumptions 


{T By,...,T By} 
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for some B; € T. 

Tableaux were invented in the 1950s independently by Ev- 
ert Beth and Jaakko Hintikka, and simplified and popularized 
by Raymond Smullyan. They are very easy to use, since con- 
structing a tableau is a very systematic procedure. Because of 
the systematic nature of tableaux, they also lend themselves to 
implementation by computer. However, a tableau is often hard 
to read and their connection to proofs are sometimes not easy to 
see. The approach is also quite general, and many different logics 
have tableau systems. Tableaux also help us to find structures that 
satisfy given (sets of) sentences: if the set is satisfiable, it won’t 
have a closed tableau, i.e., any tableau will have an open branch. 
The satisfying structure can be “read off” an open branch, pro- 
vided every rule it is possible to apply has been applied on that 
branch. There is also a very close connection to the sequent cal- 
culus: essentially, a closed tableau is a condensed derivation in 
the sequent calculus, written upside-down. 


g.5 Axiomatic Derivations 


Axiomatic derivations are the oldest and simplest logical deriva- 
tion systems. Its derivations are simply sequences of sentences. 
A sequence of sentences counts as a correct derivation if every 
sentence A in it satisfies one of the following conditions: 


1. Ais an axiom, or 
2. Ais an element of a given set I of sentences, or 
3. A is justified by a rule of inference. 


To be an axiom, A has to have the form of one of a number of fixed 
sentence schemas. There are many sets of axiom schemas that 
provide a satisfactory (sound and complete) derivation system for 
first-order logic. Some are organized according to the connectives 
they govern, e.g., the schemas 


A—(B- A) B->(BVC) (BAC)->B 
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are common axioms that govern —, V and A. Some axiom sys- 
tems aim at a minimal number of axioms. Depending on the 
connectives that are taken as primitives, it is even possible to 
find axiom systems that consist of a single axiom. 

A rule of inference is a conditional statement that gives a 
sufficient condition for a sentence in a derivation to be justified. 
Modus ponens is one very common such rule: it says that if A 
and A — B are already justified, then B is justified. This means 
that a line in a derivation containing the sentence B is justified, 
provided that both A and A — B (for some sentence A) appear 
in the derivation before B. 

The + relation based on axiomatic derivations is defined as 
follows: [+ A iff there is a derivation with the sentence A as 
its last formula (and I is taken as the set of sentences in that 
derivation which are justified by (2) above). A is a theorem if A 
has a derivation where I" is empty, i.e., every sentence in the 
derivation is justfied either by (1) or (3). For instance, here is 
a derivation that shows that + A > (B > (B v A)): 


1. B-(BV A) 
2. (B-(BV A)) > (A> (B-> (BV 4))) 
3. A>(B- (BV A)) 


The sentence on line 1 is of the form of the axiom A — (A V B) 
(with the roles of A and B reversed). The sentence on line 2 is of 
the form of the axiom A— (B— A). Thus, both lines are justified. 
Line 3 is justified by modus ponens: if we abbreviate it as D, then 
line 2 has the form C — D, where C is B > (B V A), i.e., line 1. 

A set I is inconsistent if [ + L. A complete axiom system 
will also prove that . — A for any A, and so if I’ is inconsistent, 
then [+ A for any A. 

Systems of axiomatic derivations for logic were first given by 
Gottlob Frege in his 1879 Begriffsschrift, which for this reason is 
often considered the first work of modern logic. They were per- 
fected in Alfred North Whitehead and Bertrand Russell’s Prin- 
cipia Mathematica and by David Hilbert and his students in the 
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1920s. They are thus often called “Frege systems” or “Hilbert 
systems.” They are very versatile in that it is often easy to find 
an axiomatic system for a logic. Because derivations have a very 
simple structure and only one or two inference rules, it is also rel- 
atively easy to prove things about them. However, they are very 
hard to use in practice, i.e., it is difficult to find and write proofs. 


CHAPTER 10 


The Sequent 
Calculus 


10.1 Rules and Derivations 


For the following, let 1’, 4,/7,A represent finite sequences of sen- 
tences. 


Definition 10.1 (Sequent). A sequent is an expression of the 
form 
r>A 


where I and J are finite (possibly empty) sequences of sentences 
of the language &. I is called the antecedent, while A is the succe- 
dent. 


The intuitive idea behind a sequent is: if all of the sen- 
tences in the antecedent hold, then at least one of the sen- 
tences in the succedent holds. That is, if [ = (Aj,...,Am) and 
A= (B,,...,By,), then [ => A holds iff 


(Ay A-++A Am) 2 (Bi V +++ V Bn) 


holds. There are two special cases: where J is empty and when 
Ais empty. When I is empty, i.e..m=0, = Aholds iff Bi V---Vv 
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B, holds. When A is empty, ie., n =0, I => holds iff =(A1 A 
-+> A Am) does. We say a sequent is valid iff the corresponding 
sentence is valid. 

If I is a sequence of sentences, we write 7A for the result 
of appending A to the right end of I (and A,J for the result of 
appending A to the left end of I’). If 4 is a sequence of sentences 
also, then I’, / is the concatenation of the two sequences. 


Definition 10.2 (Initial Sequent). An initial sequent is a se- 
quent of one of the following forms: 


1A>A 


21> 


for any sentence A in the language. 


Derivations in the sequent calculus are certain trees of se- 
quents, where the topmost sequents are initial sequents, and if a 
sequent stands below one or two other sequents, it must follow 
correctly by a rule of inference. The rules for LK are divided 
into two main types: logical rules and structural rules. The log- 
ical rules are named for the main operator of the sentence con- 
taining A and/or B in the lower sequent. Each one comes in two 
versions, one for inferring a sequent with the sentence containing 
the logical operator on the left, and one with the sentence on the 
right. 


10.2 Propositional Rules 


Rules for = 


Rules for A 
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[> A,A [> A,B 


Tr > AAAB of 


Rules for V 


Rules for > 


[> A,A Bll >A 
A>BIril>AA 


10.3 Quantifier Rules 


Rules for V 


A(t), = A “vi lr = A,A(a) VR 
Vx A(x), >A Tr => AVx A(x) 


In VL, ¢ is a closed term (i.e., one without variables). In VR, 
a is a constant symbol which must not occur anywhere in the 
lower sequent of the VR rule. We call a the eigenvariable of the 
VR inference.* 


Rules for 4 


+We use the term “eigenvariable” even though a in the above rule is a con- 
stant symbol. This has historical reasons. 
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A(a),T >A T = 4,A(t) 


Aor =A roAag4e 


Again, ¢ is a closed term, and a is a constant symbol which 
does not occur in the lower sequent of the SL rule. We call a the 
eigenvariable of the SL inference. 

The condition that an eigenvariable not occur in the lower 
sequent of the VR or AL inference is called the eigenvariable con- 
dition. 

Recall the convention that when A is a formula with the vari- 
able x free, we indicate this by writing A(x). In the same context, 
A(¢) then is short for A[t/x]. So we could also write the AR rule 
as: 


T = A,A[t/x] 
T= A,Ax A 


AR 


Note that ¢ may already occur in A, e.g., A might be P(¢,x). Thus, 
inferring [ > 4,Ax P(t,x) from l => A, P(t,t) is a correct appli- 
cation of IR—you may “replace” one or more, and not necessar- 
ily all, occurrences of ¢ in the premise by the bound variable x. 
However, the eigenvariable conditions in VR and SL require that 
the constant symbol a does not occur in A. So, you cannot cor- 
rectly infer [ => A,Vx P(a,x) from I => A, P(a,a) using VR. 

In SR and VL there are no restrictions on the term ¢. On 
the other hand, in the SL and VR rules, the eigenvariable condi- 
tion requires that the constant symbol a does not occur anywhere 
outside of A(a) in the upper sequent. It is necessary to ensure 
that the system is sound, i.e., only derives sequents that are valid. 
Without this condition, the following would be allowed: 


A(a) => A(a) +a A(a) => A(a) HR 
dx A(x) = A(a) R A(a) => Vx A(x) 
ay AG) => Va AG) * ay ACs) SS Va AG 


However, 4x A(x) = Vx A(x) is not valid. 
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10.4 Structural Rules 


We also need a few rules that allow us to rearrange sentences in 
the left and right side of a sequent. Since the logical rules require 
that the sentences in the premise which the rule acts upon stand 
either to the far left or to the far right, we need an “exchange” 
rule that allows us to move sentences to the right position. It’s 
also important sometimes to be able to combine two identical 
sentences into one, and to add a sentence on either side. 


Weakening 
T>A r>A 
Arad WL [> A,A ue 


Exchange 


T,A,B,II => A [=> A,A,B,A 
T,B,A,I => A _ TI => A,B,A,A aR 


A series of weakening, contraction, and exchange inferences 
will often be indicated by double inference lines. 

The following rule, called “cut,” is not strictly speaking nec- 
essary, but makes it a lot easier to reuse and combine deriva- 
tions. 
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10.5 Derivations 


We’ve said what an initial sequent looks like, and we’ve given 
the rules of inference. Derivations in the sequent calculus are 
inductively generated from these: each derivation either is an 
initial sequent on its own, or consists of one or two derivations 
followed by an inference. 


Definition 10.3 (LK derivation). An LK-derivation of a se- 
quent S is a finite tree of sequents satisfying the following condi- 
tions: 


. The topmost sequents of the tree are initial sequents. 
. The bottommost sequent of the tree is S. 


. Every sequent in the tree except S is a premise of a correct 
application of an inference rule whose conclusion stands 
directly below that sequent in the tree. 


We then say that S' is the end-sequent of the derivation and that S 
is derivable in LK (or LK-derivable). 


Example 10.4. Every initial sequent, e.g., C = C is a deriva- 
tion. We can obtain a new derivation from this by applying, say, 
the WL rule, 


T>A 


AT >A WL 


The rule, however, is meant to be general: we can replace the A 
in the rule with any sentence, e.g., also with D. If the premise 
matches our initial sequent C > C, that means that both and 
A are just C’, and the conclusion would then be D,C => C. So, 
the following is a derivation: 
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We can now apply another rule, say XL, which allows us to switch 

two sentences on the left. So, the following is also a correct 
derivation: 

C>C 

DC >C 

C,D>C 


WL 
XL 


In this application of the rule, which was given as 


I,A, BT => A 
I,B,A,T => A, 


XL 


both I’ and // were empty, 4 is C, and the roles of A and B are 
played by D and C, respectively. In much the same way, we also 
see that 


D>D 


C,D => D Wh 


is a derivation. Now we can take these two derivations, and com- 
bine them using AR. That rule was 


[> A,A [> A,B 
[TT >A,AAB 


AR 


In our case, the premises must match the last sequents of the 
derivations ending in the premises. That means that I is C,D, 4 
is empty, A is C and B is D. So the conclusion, if the inference 
should be correct, is C,D > C AD. 


C>C 

eS. D=>D wy, 

Cc,D>C C,D => D R 
C.D => CAD ie 


Of course, we can also reverse the premises, then A would be D 
and B would be C. 


Cc>C 
Dad, Deseo 
C,D => D CG a 


CD> DAC 
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10.6 Examples of Derivations 


Example 10.5. Give an LK-derivation for the sequent A A B => 
A. 

We begin by writing the desired end-sequent at the bottom of 
the derivation. 


ANB>A 


Next, we need to figure out what kind of inference could have 
a lower sequent of this form. This could be a structural rule, 
but it is a good idea to start by looking for a logical rule. The 
only logical connective occurring in the lower sequent is A, so 
we're looking for an A rule, and since the A symbol occurs in the 
antecedent, we’re looking at the AL rule. 


AAB=> A 

There are two options for what could have been the upper sequent 
of the AL inference: we could have an upper sequent of A > A, 
or of B => A. Clearly, A = A is an initial sequent (which is a 
good thing), while B = A is not derivable in general. We fill in 
the upper sequent: 


A>A 


sea! 


We now have a correct LK-derivation of the sequent AA B = A. 


Example 10.6. Give an LK-derivation for the sequent ~AV B > 
AB. 

Begin by writing the desired end-sequent at the bottom of the 
derivation. 


AAVB>A-B 


To find a logical rule that could give us this end-sequent, we look 
at the logical connectives in the end-sequent: 4, V, and —. We 
only care at the moment about V and — because they are main 
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operators of sentences in the end-sequent, while -— is inside the 
scope of another connective, so we will take care of it later. Our 
options for logical rules for the final inference are therefore the 
VL rule and the —R rule. We could pick either rule, really, but 
let’s pick the —R rule (if for no reason other than it allows us 
to put off splitting into two branches). According to the form of 
—R inferences which can yield the lower sequent, this must look 
like: 


AAAVB>B 
AAVB>A-B 


oR 


If we move —A V B to the outside of the antecedent, we can 
apply the VL rule. According to the schema, this must split into 
two upper sequents as follows: 


AaA,A > B BA>B 
AAV B,A=>B 
AAAVB>B os 

AAVB>A-B 


VL 


oR 


Remember that we are trying to wind our way up to initial se- 
quents; we seem to be pretty close! The right branch is just one 
weakening and one exchange away from an initial sequent and 
then it is done: 
B>B 
A,B > B 
AA,A > B BA>B 
AAV B,A=>B 
AAAVB>B ge 
“AVB => A-B 


—R 


Now looking at the left branch, the only logical connective 
in any sentence is the = symbol in the antecedent sentences, so 
we're looking at an instance of the =L rule. 
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B>bB 

A= B,A AB>B 

“AA>B B,A => B 
AAVB,A=>B 

AAAVB>B AB 


AAVB>A—-B 


—R 


Similarly to how we finished off the right branch, we are just 
one weakening and one exchange away from finishing off this 
left branch as well. 


A>A 
A> A,B WE B=>B 
A= B,A =) Asse 
=4,A=>B a B,A=B - 
“AVB,A=>B se 
AAAVB>B * 
—R 


AAVB>A->B 


Example 10.7. Give an LK-derivation of the sequent =A Vv 
aB => -7(AA B) 

Using the techniques from above, we start by writing the de- 
sired end-sequent at the bottom. 


AAV AB => 7A(AAB) 


The available main connectives of sentences in the end-sequent 
are the V symbol and the — symbol. It would work to apply either 
the VL or the —R rule here, but we start with the =R rule because 
it avoids splitting up into two branches for a moment: 


AA B,7AAV AB > 


AAV AB => 7A(AAB) a 


Now we have a choice of whether to look at the AL or the VL 
rule. Let’s see what happens when we apply the AL rule: we have 
a choice to start with either the sequent 4,-4 V B > or the 
sequent B,-AV B= . Since the derivation is symmetric with 
regards to A and B, let’s go with the former: 
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A,AAV AB => 
AN B,7AAV AB > 
AAV AB => 7A(AAB) 7 


AL 


R 


Continuing to fill in the derivation, we see that we run into a 


problem: 
a) 
A>A A»>B° 
=A,A => a =B,A => 
“AV BAS yD sf 
A,AAV AB => 
AL 


AA B,7AAV AB > R 
AAV AB => 7A(AAB) : 


The top of the right branch cannot be reduced any further, and 
it cannot be brought by way of structural inferences to an initial 
sequent, so this is not the right path to take. So clearly, it was a 
mistake to apply the AL rule above. Going back to what we had 
before and carrying out the VL rule instead, we get 


AA,AAB > aAB,AAB> 
AAVAB,AANB > 

AA B,AAV AB > se 

AAV AB => 7A(AAB) : 


VL 


R 


Completing each branch as we’ve done before, we get 


AD>A B>B 


ANB = AY ANB = BY 
AA,AAB > aAB,AAB> VL 


AAV AB,AANB > 
AA B,AAV AB > XL 
AAVAB => 7A(AAB) . 


R 


(We could have carried out the A rules lower than the - rules in 
these steps and still obtained a correct derivation). 
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Example 10.8. So far we haven’t used the contraction rule, but 
it is sometimes required. Here’s an example where that happens. 
Suppose we want to prove = AV-A. Applying VR backwards 
would give us one of these two derivations: 


AD R 
>A = AA 
& Aves = Avan 


Neither of these of course ends in an initial sequent. The trick 
is to realize that the contraction rule allows us to combine two 
copies of a sentence into one—and when we're searching for a 
proof, i.e., going from bottom to top, we can keep a copy of 
Av —A in the premise, e.g., 


= AV 5AA : 
= AV aAAV A 
> AV-AA 


Now we can apply VR a second time, and also get =A, which 
leads to a complete derivation. 


A>A 


= AA a, 
SAA 
SAvain 

=e Var RT ee 
> AV-AA CR 


10.7. Derivations with Quantifiers 


Example 10.9. Give an LkK-derivation of the sequent 
dx AA(x) = 7AVx A(x). 

When dealing with quantifiers, we have to make sure not to 
violate the eigenvariable condition, and sometimes this requires 
us to play around with the order of carrying out certain infer- 
ences. In general, it helps to try and take care of rules subject 
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to the eigenvariable condition first (they will be lower down in 
the finished proof). Also, it is a good idea to try and look ahead 
and try to guess what the initial sequent might look like. In our 
case, it will have to be something like A(a) = A(a). That means 
that when we are “reversing” the quantifier rules, we will have to 
pick the same term—what we will call a—for both the V and the 
J rule. If we picked different terms for each rule, we would end 
up with something like A(a) = A(d), which, of course, is not 
derivable. 
Starting as usual, we write 


dx AA(x) = 7Vx A(x) 


We could either carry out the SL rule or the =R rule. Since the 
AL rule is subject to the eigenvariable condition, it’s a good idea 
to take care of it sooner rather than later, so we’ll do that one 
first. 


3A(a) = 7AVx A(x) 


dx AA(x) = 7AVx A(x) ah 


Applying the —L and -R rules backwards, we get 


Vx A(x) = A(a) 
3A(a),Vx A(x) => 
Vx A(x),7A(a) = 7 
=A(a) > =VxA(x) — 
Ax7AA(x) = AVxA(x) 


aL 


XL 


At this point, our only option is to carry out the VL rule. Since 
this rule is not subject to the eigenvariable restriction, we’re in the 
clear. Remember, we want to try and obtain an initial sequent (of 
the form A(a) > A(a)), so we should choose a as our argument 
for A when we apply the rule. 
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A(a) = A(a) 
Veda) Aa) oe 
=A(a),Vx A(x) = sei 7 
Vx A(x),7A(a) => 7 
=A(a) = AVx A(x) _ 


dx AA(x) = 7AVx A(x) 


It is important, especially when dealing with quantifiers, to dou- 
ble check at this point that the eigenvariable condition has not 
been violated. Since the only rule we applied that is subject to 
the eigenvariable condition was AL, and the eigenvariable a does 
not occur in its lower sequent (the end-sequent), this is a correct 
derivation. 


10.8 Proof-Theoretic Notions 


Just as we’ve defined a number of important semantic notions 
(validity, entailment, satisfiabilty), we now define corresponding 
proof-theoretic notions. These are not defined by appeal to satisfac- 
tion of sentences in structures, but by appeal to the derivability 
or non-derivability of certain sequents. It was an important dis- 
covery that these notions coincide. That they do is the content 
of the soundness and completeness theorem. 


Definition 10.10 (Theorems). A sentence A is a theorem if there 
is a derivation in LK of the sequent = A. We write A if A is 
a theorem and ¥ A if it is not. 


Definition 10.11 (Derivability). A sentence A is derivable from 
a set of sentences [, [ + A, iff there is a finite subset [y C I 
and a sequence J} of the sentences in [) such that LK derives 
I) = A. If A is not derivable from I’ we write I ¥ A. 


Because of the contraction, weakening, and exchange rules, 
the order and number of sentences in J) does not matter: if a 
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sequent I) => A is derivable, then so is [y’ = A for any >’ 
that contains the same sentences as /}. For instance, if Io = 
{B,C} then both I} = (B, B,C) and I’ = (C,C, B) are sequences 
containing just the sentences in Jo. If a sequent containing one 
is derivable, so is the other, e.g.: 


B,B,C > A 
B,C >A 
C,B>A 

C.C,B> A 


From now on we'll say that if [9 is a finite set of sentences then 
I) = A is any sequent where the antecedent is a sequence of 
sentences in /9 and tacitly include contractions, exchanges, and 
weakenings if necessary. 


Definition 10.12 (Consistency). A set of sentences I is incon- 
sistent iff there is a finite subset Jy) C I such that LK derives 


Io =. If T is not inconsistent, i.e., if for every finite Jy C I, 
LK does not derive [y >, we say it is consistent. 


Proof. The initial sequent A = A is derivable, and {A} CT. oO 


Proof: Suppose [' + A, i.e., there is a finite I) C I such that 


Io = A is derivable. Since [ € A, then J is also a finite subset 
of 4. The derivation of J) = A thus also shows 4+ A. Oo 
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Proof. If [' + A, there is a finite J) C I and a derivation 7 of 
Io = A. If {A} U4 + B, then for some finite subset 49 C A, 
there is a derivation 7 of A,49 = B. Consider the following 


derivation: 


- 70 Ty 


I> A AA > B 


I),40 > B Cut 


Since I) U 4g CF UA, this shows [ UAt B. oO 


Note that this means that in particular if [+ A and At B, 
then J+ B. It follows also that if 41,...,4, + Band J + A; for 
each i, then [+ B. 


Proof. Exercise. Oo 


Proof. 1. If [+ A, then there is a finite subset Ty C I such 
that the sequent J) = A has a derivation. Consequently, 
Io t A. 


2. If I is inconsistent, there is a finite subset Jy C I” such that 
LK derives 1) =. But then [9 is a finite subset of [ that 
is inconsistent. im 
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10.9 Derivability and Consistency 


We will now establish a number of properties of the derivability 
relation. They are independently interesting, but each will play 
a role in the proof of the completeness theorem. 


Proof. There are finite [) and 1; CT such that LK derives I > 
Aand A,I; =. Let the LK-derivation of [) = A be ao and 
the LK-derivation of [;,4 = be 7 . We can then derive 


- 9 sl 


h>A ANS 
o,f, => 


Cut 


Since Jp CF and J, CI,Iy) Ul, CT, hence I is inconsis- 
tent. Oo 


Proof. First suppose I + A, i.e., there is a derivation mo of [ => A. 
By adding a -L rule, we obtain a derivation of =A, => , ie., 
I’ U {74} is inconsistent. 

If FT U{A4} is inconsistent, there is a derivation 7; of =A, > 
. The following is a derivation of => A: 
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Proof. Suppose [ + A and =A € I. Then there is a derivation 2 
of a sequent /) = A. The sequent —A,Jy = __is also derivable: 


a ADA LL 
Ip SA A,AA => 
Cut 


T,AA => 


Since =A € I and [yg CT, this shows that [is inconsistent. O 


Proof. There are finite sets Jy CG IT and I; C TI and LK- 


derivations 7p and 7; of A,Jg => and-=A,J, =, respectively. 
We can then derive 


: 1 
; > 7 
A,Io => : 
Io > aA “ih 3aA,Ty => 
To,14 => 


Cut 


Since Jp CF andl, CI,I)Ul, CT. Hence I is inconsistent.o 


10.10 Derivability and the Propositional 
Connectives 


We establish that the derivability relation + of the sequent calcu- 
lus is strong enough to establish some basic facts involving the 
propositional connectives, such as that AAB+ Aand A,A—Bt B 
(modus ponens). These facts are needed for the proof of the com- 
pleteness theorem. 
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Proof. 1. Both sequents AA B => A and AA B= B are deriv- 
able: 


A>A B>bB 
Ak SA AAB=S B 


2. Here is a derivation of the sequent A,B > AA B: 


A>A B=>B 
ARS AAR OE O 


Proof: 1. We give a derivation of the sequent AV B,=A,-B =: 


A>A B>B 
AA,A => ah AB,B => aL 
A,AA,AB => B,AA,AB => 
VL 


AV B,7AA,=AB => 


(Recall that double inference lines indicate several weaken- 
ing, contraction, and exchange inferences.) 


2. Both sequents A > AV B and B => AV B have derivations: 


A>A B>B 
ASave Bs Ave ‘* - 
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Proof: 1. The sequent A —> B,A = B is derivable: 


A>A B>bB 


AS BAS OE 


2. Both sequents =A => A— Band B > A— B are derivable: 


A>A > 
aA,A => XL 
A,jAA => B>B 
Aad oo * Apap VE - 
—ASAé3nB BS ASS 


10.11 Derivability and the Quantifiers 


The completeness theorem also requires that the sequent calculus 
rules rules yield the facts about + established in this section. 


Proof. Let mp be an LK-derivation of [y = A(c) for some finite 
Io G I. By adding a VR inference, we obtain a derivation of 
Ty = Vx A(x), since c does not occur in T or A(x) and thus the 
eigenvariable condition is satisfied. oO 


Proof. 1. The sequent A(t) = Sx A(x) is derivable: 
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A(t) = A(t) 
A(t) = Ax A(x) 


AR 


2. The sequent Vx A(x) = A(¢) is derivable: 


A(t) = A(t) 
Waa Saay O 


10.12 Soundness 


A derivation system, such as the sequent calculus, is sound if 
it cannot derive things that do not actually hold. Soundness is 
thus a kind of guaranteed safety property for derivation systems. 
Depending on which proof theoretic property is in question, we 
would like to know for instance, that 


1. every derivable A is valid; 


2. if a sentence is derivable from some others, it is also a 
consequence of them; 


3. if a set of sentences is inconsistent, it is unsatisfiable. 


These are important properties of a derivation system. If any of 
them do not hold, the derivation system is deficient—it would 
derive too much. Consequently, establishing the soundness of a 
derivation system is of the utmost importance. 

Because all these proof-theoretic properties are defined via 
derivability in the sequent calculus of certain sequents, prov- 
ing (1)—(3) above requires proving something about the seman- 
tic properties of derivable sequents. We will first define what it 
means for a sequent to be valid, and then show that every deriv- 
able sequent is valid. (1)—(3) then follow as corollaries from this 
result. 
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Definition 10.27. A structure M satisfies a sequent [ => A iff 
either M # A for some A € J or MEA for some A € Z. 


A sequent is valid iff every structure M satisfies it. 


Proof: Let be a derivation of 9 > £. We proceed by induction 
on the number of inferences n in 7. 

If the number of inferences is 0, then 7 consists only of an 
initial sequent. Every initial sequent A => A is obviously valid, 
since for every M, either M ¥ A or ME A. 

If the number of inferences is greater than 0, we distinguish 
cases according to the type of the lowermost inference. By induc- 
tion hypothesis, we can assume that the premises of that inference 
are valid, since the number of inferences in the derivation of any 
premise is smaller than n. 

First, we consider the possible inferences with only one 
premise. 


1. The last inference is a weakening. Then 0 => & is either 
A,I’ => A (if the last inference is WL) or [ => 4,A (if it’s 
WR), and the derivation ends in one of 


By induction hypothesis, [ = A is valid, ie., for every 
structure M, either there is some C € I such that M ¢ C 
or there is some C € A such that ME C. 


If M ¥ C for some C' € I, then C € @ as well since 0 = A,T, 
and so M ¥ C for some C € @. Similarly, if M + C for some 
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CedA,asCe &,MeE C for some C € &. Consequently, 
O = & is valid. 


2. The last inference is ~L: Then the premise of the last in- 
ference is [ = 4,A and the conclusion is =A, => JA, i.e., 
the derivation ends in 


| 


“AT > A ae 


and 9 = 7=4,I while © = 4. 


The induction hypothesis tells us that [ = A,A is valid, 
ie., for every M, either (a) for some C € I, M £ C, or (b) 
for some C € A, ME C, or (c) M = A. We want to show 
that O => & is also valid. Let M be a structure. If (a) holds, 
then there is C € I’ so that M £ C, but C € O as well. If 
(b) holds, there is C € 4 such that M £ C, but C € © as 
well. Finally, if M + A, then M ¢ —A. Since =A € @, there 
is C € © such that M # C. Consequently, 9 = & is valid. 


3. The last inference is =R: Exercise. 


4. The last inference is AL: There are two variants: AA B may 
be inferred on the left from A or from B on the left side of 
the premise. In the first case, the m ends in 


Ar >A 
ANB, >A 


AL 


and 9 = AA B,I while 2 = A. Consider a structure M. 
Since by induction hypothesis, A,’ = A is valid, (a) M # A, 
(b) M £ C for some C' € I, or (c) ME C for some C € A. In 


CHAPTER 10. THE SEQUENT CALCULUS 179 


case (a), M # ANB, so there is C' € © (namely, AA B) such 
that M ¥ C. In case (b), there is C € I such that M ¥ C, 
and C € © as well. In case (c), there is C € A such that 
MF. C, and C € & as well since & = 4. So in each case, 
M satisfies A \ B,I = A. Since M was arbitrary, [ = A is 
valid. The case where A A B is inferred from B is handled 
the same, changing A to B. 


5. The last inference is VR: There are two variants: AV B may 
be inferred on the right from A or from B on the right side 
of the premise. In the first case, 7 ends in 


r> AA 
T > AAVB 


VR 


Now 9 =T and & = 4,A V B. Consider a structure M. 
Since [ => 4,A is valid, (a) M § A, (b) M ¥ C for some 
C €T, or (c) Me C for some C € JA. Incase (a), MF AVB. 
In case (b), there is C € I such that M ¢ C. In case (c), 
there is C € A such that M § C. So in each case, M satisfies 
[> 4A,AV B,i.e., 0 => &. Since M was arbitrary, 90 => 2 
is valid. The case where AV B is inferred from B is handled 
the same, changing A to B. 


6. The last inference is —R: Then z ends in 


AI = AB 


rs jA2 


Again, the induction hypothesis says that the premise is 
valid; we want to show that the conclusion is valid as well. 
Let M be arbitrary. Since A,’ = A,B is valid, at least one 
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of the following cases obtains: (a) M # A, (b) M ¢ B, (c) 
Ms C for some C € TI, or (d) Mt C for some C € J. In 
cases (a) and (b), Mt A— Band so there isa C € 4,A—-B 
such that M § C. In case (c), for some Cel, Me C. In 
case (d), for some C' € A, MF C. In each case, M satisfies 
I => A,A— B. Since M was arbitrary, [ => A,A — B is 
valid. 


7. The last inference is VL: Then there is a formula A(x) and 
a closed term é such that z ends in 


A(t),r => 4 
Vx A(x), = A 


VL 


We want to show that the conclusion Vx A(x), = A is 
valid. Consider a structure M. Since the premise A(t), => 
A is valid, (a) M ¥ A(t), (b) M # C for some C € I, or 
(c) Met C for some C € A. In case (a), by Proposition 7.30, 
if M & Vx A(x), then M & A(t). Since M ¥ A(t), M # 
Vx A(x) . In case (b) and (c), M also satisfies Vx A(x), > 
A. Since M was arbitrary, Vx A(x),[’ = A is valid. 


8. The last inference is SR: Exercise. 


g. The last inference is VR: Then there is a formula A(x) and 
a constant symbol a such that z ends in 


T => 4,A(a) 
T => A,Vx A(x) 


VR 


where the eigenvariable condition is satisfied, i.e., a does 
not occur in A(x), I, or 4. By induction hypothesis, the 
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premise of the last inference is valid. We have to show that 
the conclusion is valid as well, i-e., that for any structure M, 
(a) Me Vx A(x), (b) M¥ C for some C € T, or (c) MEC 
for some C € 4. 


Suppose M is an arbitrary structure. If (b) or (c) holds, we 
are done, so suppose neither holds: for all Cel, ME C, 
and for all C € 4, M ¢ C. We have to show that (a) holds, 
ie., M & Vx A(x). By Proposition 7.18, if suffices to show 
that M,s & A(x) for all variable assignments s. So let s be an 
arbitrary variable assignment. Consider the structure M’ 
which is just like M except a” = s(x). By Corollary 7.20, 
for any C € I, M’ € C since a does not occur in I’, and 
for any C' « A, M’ ¥ C. But the premise is valid, so M’ & 
A(a). By Proposition 7.17, M’,s §& A(a), since A(a) is a 
sentence. Now s ~, s with s(x) = Val" (a), since we’ve 
defined M’ in just this way. So Proposition 7.22 applies, 
and we get M’,s & A(x). Since a does not occur in A(x), 
by Proposition 7.19, M,s & A(x). Since s was arbitrary, 
we've completed the proof that M,s + A(x) for all variable 
assignments. 


10. The last inference is AL: Exercise. 
Now let’s consider the possible inferences with two premises. 


1. The last inference is a cut: then z ends in 


ro4A ATSA 
Tl > A,A 


Cut 


Let M be a structure. By induction hypothesis, the premises 
are valid, so M satisfies both premises. We distinguish two 
cases: (a) M # Aand (b) M £ A. In case (a), in order for M 
to satisfy the left premise, it must satisfy [ = 4. But then 
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it also satisfies the conclusion. In case (b), in order for M 
to satisfy the right premise, it must satisfy J7 \ A. Again, 
M satisfies the conclusion. 


2. The last inference is AR. Then z ends in 


rProSoAaA fFPoSoAaR 
[I >aA,AAB 


AR 


Consider a structure M. If M satisfies [ = A, we are 
done. So suppose it doesn’t. Since [ = 4,A is valid by 
induction hypothesis, M + A. Similarly, since [ => A,B is 
valid, Mr B. But then Me AA B. 


3. The last inference is VL: Exercise. 


4. The last inference is ~L. Then z ends in 


rSAdt Bars A 
ASBEET S&S AA 


oL 


Again, consider a structure M and suppose M doesn’t sat- 
isfy I,J] = A,A. We have to show that M ¢ A — B. If M 
doesn’t satisfy [,/7 = A, A, it satisfies neither [ => 4 nor 
IT = A. Since, [ => A,A is valid, we have M + A. Since 
B,IT = A is valid, we have M ¥ B. But then M ¥ A— B, 
which is what we wanted to show. Oo 


CHAPTER 10. THE SEQUENT CALCULUS 183 


Proof. If [ + A then for some finite subset J) C I, there is 
a derivation of I) = A. By Theorem 10.28, every structure M 
either makes some B € [y false or makes A true. Hence, if M & I 
then also M F A. Oo 


Proof: We prove the contrapositive. Suppose that I’ is not consis- 
tent. Then there is a finite 7) C [ anda derivation of 1) > . By 
Theorem 10.28, [9 > _ is valid. In other words, for every struc- 
ture M, there is C € Io so that M ¥ C, and since Ip C I, that C 
is also in /. Thus, no M satisfies 7, and I is not satisfiable. O 


10.13 Derivations with Identity predicate 


Derivations with identity predicate require additional initial se- 
quents and inference rules. 


Definition 10.32 (Initial sequents for =). If ¢ is a closed term, 


then = ¢ = ¢ is an initial sequent. 


The rules for = are (¢; and f9 are closed terms): 


h=b, => A,A(h) _ =, = A,A(to) _ 
h=b,. > A, A(t) h=b, > A, A(t) 


Example 10.33. If s and ¢ are closed terms, then s = ¢,A(s) + 
A(t): 


A(s) = A(s) 
=A = AG 
s=t,A(s) = A(t) 
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This may be familiar as the principle of substitutability of iden- 
ticals, or Leibniz’ Law. 
LK proves that = is symmetric and transitive: 


hK=~b > h=b 


>h=h WL b=-t&h=~t > hab WL 
h=at>ph=ah _ b=-thh=~b > h=b XL 
h=t > b=h hH=~b,to=-b > h=b 


In the derivation on the left, the formula x = 4 is our A(x). On 
the right, we take A(x) to be 4 =x. 


10.14 Soundness with Identity predicate 


Proof. Initial sequents of the form = ¢ = ¢ are valid, since for 
every structure M, M ¢ ¢ = ¢. (Note that we assume the term ¢ to 
be closed, i.e., it contains no variables, so variable assignments 
are irrelevant). 

Suppose the last inference in a derivation is =. Then the 
premise is 4) = &,l = A,A(4) and the conclusion is 4 = t2,l => 
A, A(t). Consider a structure M. We need to show that the 
conclusion is valid, ie., if M — 4 = tg and M FE I, then either 
Me C for some C € A or MEF A(é2). 

By induction hypothesis, the premise is valid. This means 
that if M — # = #42 and ME TI either (a) for some C € 4, Mr 
C or (b) M & A(t). In case (a) we are done. Consider case 
(b). Let s be a variable assignment with s(x) = Val! (¢1). By 
Proposition 7.17, M,s + A(t). Since s ~, s, by Proposition 7.22, 
M,s § A(x). since M § t; = f, we have Val! (t,) = ValM@(t), and 
hence s(x) = ValM(t)). By applying Proposition 7.22 again, we 
also have M,s & A(tg). By Proposition 7.17, M & A(t). Oo 
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Summary 


Proof systems provide purely syntactic methods for characteriz- 
ing consequence and compatibility between sentences. The se- 
quent calculus is one such proof system. A derivation in it 
consists of a tree of sequents (a sequent [=> A consists of two 
sequences of formulas separated by >). The topmost sequents 
in a derivation are initial sequents of the form A => A. All other 
sequents, for the derivation to be correct, must be correctly jus- 
tified by one of a number of inference rules. These come in 
pairs; a rule for operating on the left and on the right side of 
a sequent for each connective and quantifier. For instance, if a 
sequent => 4,A — B is justified by the —R rule, the preceding 
sequent (the premise) must be A,/ = 4,B. Some rules also 
allow the order or number of sentences in a sequent to be manip- 
ulated, e.g., the XR rule allows two formulas on the right side of 
a sequent to be switched. 

If there is a derivation of the sequent = A, we say Aisa 
theorem and write | A. If there is a derivation of J) = A where 
every B in Ip is in I’, we say A is derivable from [and write 
I+ A. If there is a derivation of [jy = where every B in Io 
is in , we say I’ is inconsistent, otherwise consistent. These 
notions are interrelated, e.g., [+ A iff I U {4A} is inconsistent. 
They are also related to the corresponding semantic notions, e.g., 
if [ + Athen I + A. This property of proof systems—what can 
be derived from I is guaranteed to be entailed by !—is called 
soundness. The soundness theorem is proved by induction on 
the length of derivations, showing that each individual inference 
preserves validity of the conclusion sequent provided the premise 
sequents are valid. 


Problems 


Problem 10.1. Give derivations of the following sequents: 


1. AA(BAC) > (AAB)AC. 
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2. 
3. 
4. 
Problem 10.2. Give derivations of the following sequents: 
1. 
2. 
3. 
4. 
5. 
. = .(A-> B)-> AB. 


10. 
11. 
12. 
Problem 10.3. Give derivations of the following sequents: 
1. 

2. 


3. 


AV(BVC)S>(AVB)VC. 
A> (B>C)> BR (AC). 


A> -—-A. 


(AV B) —9C>ARC. 

(A> C)A(B>C) > (AV B) OC. 
=> -=(AA-A). 

BoA>-A--B. 


=> (A> 7A) > AA. 


.A>C>-(AA-C). 
_AAAC > -(A> 0). 


_AVB,AB=> A. 


AAV AB=>-7(AAB). 
=> (AAA 7B) —> 7A(AV B). 


=> (AV B) > (AAA RB). 


(A — B) > A. 
a(A A B) => AAV -B. 


A>~B=>-AVB. 


.>aA- A. 


._A>B-AAPB=>B. 
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6. (AA B) > C>(A-C)V(B-C). 
7. (A> B)>- APA. 
8. = (A> B)v (BC). 
(These all require the CR rule.) 
Problem 10.4. Give derivations of the following sequents: 
1. = (Vx A(x) A Vy B(y)) — Vz (A(z) A B(2)). 
2. = (dx A(x) V Ay B(y)) > Az (A(z) V B(2)). 
3. Vx (A(x) > B) = Ay AQ) > B. 
4. Wx aA(x) = 75x A(x). 
5. = 74x A(x) - Vx 4 A(x). 
6. = 3x Vy (A(x) 3 AQ.) A (AAG) > AC). 
Problem 10.5. Give derivations of the following sequents: 
1. = 7AVx A(x) — dx A(x). 
2. (Vx A(x) > B) = Ay (A(y) > B). 
3. = Ax (A(x) — Vy A(Y)). 
(These all require the CR rule.) 
Problem 10.6. Prove Proposition 10.16 
Problem 10.7. Prove that [+ 7A iff TU {A} is inconsistent. 
Problem 10.8. Complete the proof of Theorem 10.28. 
Problem 10.9. Give derivations of the following sequents: 
1. SVxVy ((x = y A A(x)) > AVy)) 


2. dx A(x) A VyVz ((A(y) A A(z)) > y = 2) = Ax (A(x) A 
Vy (AY) > y = *)) 


CHAPTER 11 


Natural 
Deduction 


11.1 Rules and Derivations 


Natural deduction systems are meant to closely parallel the infor- 
mal reasoning used in mathematical proof (hence it is somewhat 
“natural”). Natural deduction proofs begin with assumptions. In- 
ference rules are then applied. Assumptions are “discharged” by 
the =Intro, —Intro, VElim and SElim inference rules, and the 
label of the discharged assumption is placed beside the inference 
for clarity. 


Definition 11.1 (Assumption). An assumption is any sentence 


in the topmost position of any branch. 


Derivations in natural deduction are certain trees of sen- 
tences, where the topmost sentences are assumptions, and if 
a sentence stands below one, two, or three other sequents, it 
must follow correctly by a rule of inference. The sentences at 
the top of the inference are called the premises and the sentence 
below the conclusion of the inference. The rules come in pairs, an 
introduction and an elimination rule for each logical operator. 
They introduce a logical operator in the conclusion or remove 
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a logical operator from a premise of the rule. Some of the rules 
allow an assumption of a certain type to be discharged. To indi- 
cate which assumption is discharged by which inference, we also 
assign labels to both the assumption and the inference. This is 
indicated by writing the assumption as “[.A]”.” 

It is customary to consider rules for all the logical operators 
A, V, 2, 2, and L, even if some of those are defined. 


11.2 Propositional Rules 


Rules for A 


A 


ARB AIntro 


Rules for V 


AVB VIntro 


el VIntro 


B 
n—P__ 
AB 


—Intro 


Rules for -= 
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=Intro 


Rules for 1 


Note that =Intro and 1c are very similar: The difference is 
that —Intro derives a negated sentence =A but 1¢ a positive sen- 
tence A. 


Whenever a rule indicates that some assumption may be dis- 
charged, we take this to be a permission, but not a requirement. 
E.g., in the —Intro rule, we may discharge any number of assump- 
tions of the form A in the derivation of the premise B, including 
zero. 


11.3 Quantifier Rules 


Rules for V 


In the rules for V, ¢ is a closed term (a term that does not 
contain any variables), and a is a constant symbol which does 
not occur in the conclusion Vx A(x), or in any assumption which 
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is undischarged in the derivation ending with the premise A(a). 
We call a the eigenvariable of the VIntro inference.* 


Rules for 4 


Again, ¢ is a closed term, and a is a constant which does 
not occur in the premise 4x A(x), in the conclusion C’, or any 
assumption which is undischarged in the derivations ending with 
the two premises (other than the assumptions A(a)). We call a 
the eigenvariable of the SElim inference. 

The condition that an eigenvariable neither occur in the 
premises nor in any assumption that is undischarged in the 
derivations leading to the premises for the VIntro or 3Elim in- 
ference is called the eigenvariable condition. 

Recall the convention that when A is a formula with the vari- 
able x free, we indicate this by writing A(x). In the same context, 
A(é) then is short for A[t/x]. So we could also write the AIntro 
rule as: 

AT AIntro 

Note that ¢ may already occur in A, e.g., A might be P(¢,x). 
Thus, inferring 4x P(t,x) from P(t,t) is a correct application 
of AIntro—you may “replace” one or more, and not necessar- 
ily all, occurrences of ¢ in the premise by the bound variable x. 
However, the eigenvariable conditions in VIntro and Elim re- 
quire that the constant symbol a does not occur in A. So, you 
cannot correctly infer Vx P(a,x) from P(a,a) using VIntro. 


+We use the term “eigenvariable” even though a in the above rule is a 
constant. This has historical reasons. 
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In HIntro and VElim there are no restrictions, and the term ¢ 
can be anything, so we do not have to worry about any conditions. 
On the other hand, in the 3Elim and VIntro rules, the eigenvari- 
able condition requires that the constant symbol a does not occur 
anywhere in the conclusion or in an undischarged assumption. 
The condition is necessary to ensure that the system is sound, 
i.e., only derives sentences from undischarged assumptions from 
which they follow. Without this condition, the following would 


be allowed: 
[A(a)]*_ , 
Ax A(x) Wx AQ) et 
Vx A(x) me 


However, x A(x) # Vx A(x). 

As the elimination rules for quantifiers only allow substituting 
closed terms for variables, it follows that any formula that can be 
derived from a set of sentences is itself a sentence. 


11.4 Derivations 


We’ve said what an assumption is, and we’ve given the rules of 
inference. Derivations in natural deduction are inductively gen- 
erated from these: each derivation either is an assumption on its 
own, or consists of one, two, or three derivations followed by a 
correct inference. 


Definition 11.2 (Derivation). A derivation of a sentence A 
from assumptions [ is a finite tree of sentences satisfying the 
following conditions: 


1. The topmost sentences of the tree are either in J’ or are 
discharged by an inference in the tree. 


2. The bottommost sentence of the tree is A. 
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3. Every sentence in the tree except the sentence A at the bot- 
tom is a premise of a correct application of an inference 
rule whose conclusion stands directly below that sentence 
in the tree. 


We then say that A is the conclusion of the derivation and I its 
undischarged assumptions. 

If a derivation of A from I" exists, we say that A is derivable 
from I’, or in symbols: [+ A. If there is a derivation of A in 
which every assumption is discharged, we write + A. 


Example 11.3. Every assumption on its own is a derivation. So, 
e.g., A by itself is a derivation, and so is B by itself. We can 
obtain a new derivation from these by applying, say, the AIntro 
rule, 


A B 
AAR AIntro 

These rules are meant to be general: we can replace the A and B 

in it with any sentences, e.g., by C and D. Then the conclusion 

would be C A D, and so 


C D 
CAD AIntro 
is a correct derivation. Of course, we can also switch the assump- 
tions, so that D plays the role of A and C that of B. Thus, 


Poe AIntro 
is also a correct derivation. 

We can now apply another rule, say, —Intro, which allows 
us to conclude a conditional and allows us to discharge any as- 
sumption that is identical to the antecedent of that conditional. 
So both of the following would be correct derivations: 
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[c]’ D C [D]' 
1 OD 1 RO Ose 
C—- (CAD) D- (CAD) 


They show, respectively, that Dt C—> (CAD) andC+ D->(CA 
D). 

Remember that discharging of assumptions is a permission, 
not a requirement: we don’t have to discharge the assumptions. 
In particular, we can apply a rule even if the assumptions are 
not present in the derivation. For instance, the following is legal, 
even though there is no assumption A to be discharged: 


B 


1 A->B — Intro 


11.5 Examples of Derivations 


Example 11.4. Let’s give a derivation of the sentence (AA B) > 
A. 

We begin by writing the desired conclusion at the bottom of 
the derivation. 


(ANB) A 


Next, we need to figure out what kind of inference could result 
in a sentence of this form. The main operator of the conclusion 
is —, so we'll try to arrive at the conclusion using the —Intro 
rule. It is best to write down the assumptions involved and label 
the inference rules as you progress, so it is easy to see whether 
all assumptions have been discharged at the end of the proof. 


[AA Bt 


A 


i (AA B) >A —Intro 
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We now need to fill in the steps from the assumption A A B 
to A. Since we only have one connective to deal with, A, we must 
use the A elim rule. This gives us the following proof: 


[AA B]! 
jes 
(AN B)—>A 


AElim 
—Intro 


We now have a correct derivation of (A A B) > A. 


Example 11.5. Now let’s give a derivation of (-AVB)—(A—B). 
We begin by writing the desired conclusion at the bottom of 
the derivation. 


(-AvV B) > (A— B) 


To find a logical rule that could give us this conclusion, we look at 
the logical connectives in the conclusion: =, V, and —. We only 
care at the moment about the first occurrence of — because it is 
the main operator of the sentence in the end-sequent, while =, Vv 
and the second occurrence of — are inside the scope of another 
connective, so we will take care of those later. We therefore start 
with the —Intro rule. A correct application must look like this: 


[=A v B]? 


: A>B 
(AAV B) > (A> B) 


—Intro 


This leaves us with two possibilities to continue. Either we can 
keep working from the bottom up and look for another applica- 
tion of the —Intro rule, or we can work from the top down and 
apply a VElim rule. Let us apply the latter. We will use the as- 
sumption —A V B as the leftmost premise of VElim. For a valid 
application of VElim, the other two premises must be identical 
to the conclusion A — B, but each may be derived in turn from 
another assumption, namely one of the two disjuncts of =A V B. 
So our derivation will look like this: 
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[=A]? [B]? 
; [AvB} A>B A>B 


4 A-—B 
(AAV B) > (A— B) 


VElim 
— Intro 


In each of the two branches on the right, we want to derive 
A — B, which is best done using —Intro. 


[-4]’, [A]? [B]’,[4]* 


,_ B B 
_b4vel AB eG: ae 
‘ A-—B 
(AA Vv B) > (A— B) 


— Intro 
VElim 


— Intro 


For the two missing parts of the derivation, we need deriva- 
tions of B from —A and A in the middle, and from A and B on the 
left. Let’s take the former first. —A and A are the two premises of 
Elim: 


4A)2 Al3 
aa 7 4) Elim 
B 
By using 7, we can obtain B as a conclusion and complete the 
branch. 
[B]*, [A]* 
=A 2 A 3 : 
[Al 4) Intro 
[-4vep AaB Ot aR intro 
2 VElim 
A—B 
—Intro 


‘AV B) > (A>B) 
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Let’s now look at the rightmost branch. Here it’s important 
to realize that the definition of derivation allows assumptions to be 
discharged but does not require them to be. In other words, if we 
can derive B from one of the assumptions A and B without using 
the other, that’s ok. And to derive B from B is trivial: B by itself 
is such a derivation, and no inferences are needed. So we can 
simply delete the assumption A. 


=A 2 A 3 
fe ae 4) aElim 
a [B}? 
1 3 ———.- Intro ——.,_ Intro 
[=A Vv B] A->B A-B 
VElim 
A—B 


Avs ise 
Note that in the finished derivation, the rightmost — Intro infer- 
ence does not actually discharge any assumptions. 


Example 11.6. So far we have not needed the 1c rule. It is 
special in that it allows us to discharge an assumption that isn’t a 
sub-formula of the conclusion of the rule. It is closely related to 
the 7 rule. In fact, the 1; rule is a special case of the L¢ rule— 
there is a logic called “intuitionistic logic” in which only 1, is 
allowed. The 1¢ rule is a last resort when nothing else works. For 
instance, suppose we want to derive A V =A. Our usual strategy 
would be to attempt to derive AV-—=A using VIntro. But this would 
require us to derive either A or —A from no assumptions, and this 
can’t be done. 1c to the rescue! 


[>(4 v 74) P 
i 


re © 
P Aaa 


Now we're looking for a derivation of 1 from =(A V —A). Since 
1 is the conclusion of ~Elim we might try that: 
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[-(Av-A)]* — [=(Av 7A)]? 


=A 


zl AElim 
1 a t¢ 
AV -=AA 


Our strategy for finding a derivation of 5A calls for an application 
of =Intro: 


[>(A v 74)]}, [A? : 
. [>(A v +4)] 


iE 
2 as =Intro a 
aElim 
1 ees Lo 
AV -AA 


Here, we can get 1 easily by applying —Elim to the assumption 
a(A V =A) and AV —=A which follows from our new assumption 
A by VIntro: 


[A]2 [=(A v =A)]? 
[(4v 7A]? “Ava Yintre | 
ii AElim 
2 = 5 aIntro A ; 
i : aElim 
“Ava 


On the right side we use the same strategy, except we get A by Lc: 


[A]? I [=A]? : 
[(4va4ytAvad YR tava avna 
1 aElim i aElin 
2 “A Intro 3 ae le 
au aElim 
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11.6 Derivations with Quantifiers 


Example 11.7. When dealing with quantifiers, we have to make 
sure not to violate the eigenvariable condition, and sometimes 
this requires us to play around with the order of carrying out 
certain inferences. In general, it helps to try and take care of rules 
subject to the eigenvariable condition first (they will be lower 
down in the finished proof). 

Let’s see how wed give a derivation of the formula 4x —A(x)— 
=Vx A(x). Starting as usual, we write 


Ax AA(x) — AVx A(x) 


We start by writing down what it would take to justify that last 
step using the —Intro rule. 


[Sx -A(x)]? 


; Env e, 
dx A(x) — AVx A(x) 


—Intro 


Since there is no obvious rule to apply to =Vx A(x), we will pro- 
ceed by setting up the derivation so we can use the JElim rule. 
Here we must pay attention to the eigenvariable condition, and 
choose a constant that does not appear in 4x A(x) or any assump- 
tions that it depends on. (Since no constant symbols appear, 
however, any choice will do fine.) 


[-A(@)]? 


3xA(x)}} Vx A 

[Ax +A(x)] x A(x) Silica 

7 aVx A(x) Tt 
Ax 4A(x) — AVx A(x) tee 


In order to derive =Vx A(x), we will attempt to use the —Intro 
rule: this requires that we derive a contradiction, possibly using 
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Vx A(x) as an additional assumption. Of course, this contradic- 
tion may involve the assumption —A(a) which will be discharged 
by the SElim inference. We can set it up as follows: 


[-A(a)]*, [Vx A(x)]° 


4 3 ——+ Intro 
[Ax =A(x)] aVx A(x) ; 
2 SElim 
aVx A(x) 
—Intro 


"ai 3A(x) 3 AVx A(x) 


It looks like we are close to getting a contradiction. The easiest 
rule to apply is the VElim, which has no eigenvariable conditions. 
Since we can use any term we want to replace the universally 
quantified x, it makes the most sense to continue using a so we 
can reach a contradiction. 


[Vx A(x) ]8 VEL 
[3A(a)]? AG 
L =Elim 
meu oe AG iran 
Vx A(x) i 
— Intro 


* Sx A(x) Vx A(x) 


It is important, especially when dealing with quantifiers, to 
double check at this point that the eigenvariable condition has 
not been violated. Since the only rule we applied that is subject 
to the eigenvariable condition was SElim, and the eigenvariable a 
does not occur in any assumptions it depends on, this is a correct 
derivation. 


Example 11.8. Sometimes we may derive a formula from other 
formulas. In these cases, we may have undischarged assumptions. 
It is important to keep track of our assumptions as well as the end 
goal. 
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Let’s see how wed give a derivation of the formula 4x C(x, d) 
from the assumptions 4x (A(x) A B(x)) and Vx (B(x) > C(x, d)). 
Starting as usual, we write the conclusion at the bottom. 


Ax C(x, db) 


We have two premises to work with. To use the first, i.e., try 
to find a derivation of 4x C(x, b) from Ax (A(x) A B(x)) we would 
use the SElim rule. Since it has an eigenvariable condition, we 


will apply that rule first. We get the following: 
[A(a) A B(a)]* 


: dx (A(x) A B(x)) Ax C(x.) 7 


SCG) Bien 


The two assumptions we are working with share B. It may be 
useful at this point to apply AElim to separate out B(a). 


[A(a) A B(a)]' 
B(a) 


AElim 


dx (A(x) A B(x)) Ax Cx, b) 
: ar CG) = 


Elim 


The second assumption we have to work with is Vx (B(x) > 
C(x,b)). Since there is no eigenvariable condition we can instan- 
tiate x with the constant symbol a using VElim to get B(a) > 
C(a,b). We now have both B(a) > C(a,b) and B(a). Our next 
move should be a straightforward application of the Elim rule. 
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Vx (B(x) > C(x,))) [A(a) A B(a)]* . 
B(@) > Cla.) VElim Ba) ee 
C(a,b) a 
dx (A(x) A B(x)) Ax C(x.5) . 
: ay CG) a 


We are so close! One application of AIntro and we have reached 
our goal. 
Vx (B(x) > C(x,5)) [A AB@)) 
B(a) > C(a.b) VElim Ba) AElim 
C(a,b) 
3x (Aa) A BC) Cah ae 
Ax C(x, bd) 


— Elim 


Since we ensured at each step that the eigenvariable conditions 
were not violated, we can be confident that this is a correct deriva- 
tion. 


Example 11.9. Give a derivation of the formula =Vx A(x) from 
the assumptions Vx A(x) — dy B(y) and dy B(y). Starting as 
usual, we write the target formula at the bottom. 


AVX A(x) 


The last line of the derivation is a negation, so let’s try using 
-Intro. This will require that we figure out how to derive a con- 
tradiction. 


[Vx A(x)]* 


iL 
1 Wx A(x) =aIntro 
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So far so good. We can use VElim but it’s not obvious if that will 
help us get to our goal. Instead, let’s use one of our assumptions. 
Vx A(x) — Ay B(y) together with Vx A(x) will allow us to use the 
—Elim rule. 


Vx A(x) > dy BYy) [Vx A(x)]* 
Ay Biy) 


—> Elim 


dL 
1 Wee =Intro 


We now have one final assumption to work with, and it looks like 
this will help us reach a contradiction by using Elim. 


Vx A(x) > dy BY) [Vx A(x)]? 
“3y By) 3 BG) 


=Intro 


—Elim 


AElim 


;—_ + 
aVx A(x) 


11.7. Proof-Theoretic Notions 


Just as we’ve defined a number of important semantic notions 
(validity, entailment, satisfiability), we now define corresponding 
proof-theoretic notions. These are not defined by appeal to satisfac- 
tion of sentences in structures, but by appeal to the derivability 
or non-derivability of certain sentences from others. It was an 
important discovery that these notions coincide. That they do is 
the content of the soundness and completeness theorems. 


Definition 11.10 (Theorems). A sentence A is a theorem if there 
is a derivation of A in natural deduction in which all assumptions 


are discharged. We write + A if A is a theorem and ¥ A if it is not. 
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Definition 11.11 (Derivability). A sentence A is derivable from 
a set of sentences [’, [+ A, if there is a derivation with conclu- 
sion A and in which every assumption is either discharged or is 
in I’. If A is not derivable from I we write I’ ¥ A. 


Definition 11.12 (Consistency). A set of sentences I is incon- 
sistent iff [ + L. If Fis not inconsistent, i.e., if [ ¥ L, we say it is 
consistent. 


Proof. The assumption A by itself is a derivation of A where every 
undischarged assumption (i.e., A) is in I’. oO 


Proof. Any derivation of A from I is also a derivation of A 
from J. o 


Proof. If I’ + A, there is a derivation 50 of A with all undischarged 
assumptions in I’. If {A} U4 + B, then there is a derivation 61 
of B with all undischarged assumptions in {A} U4. Now consider: 


The undischarged assumptions are now all among IU J, so this 
shows [UAt B. Oo 


When I = {Aj, Ao,..., Az} is a finite set we may use the sim- 
plified notation A), A9,...,A, + B for [+ B, in particular A+ B 
means that {A} + B. 

Note that if + A and At B, then I+ B. It follows also that 
if Aj,...,A, + Band I+ A; for each i, then + B. 


Proposition 11.16. The following are equivalent. 
1. T° is inconsistent. 
2. [+ A for every sentence A. 
3. [+ AandI'+ —A for some sentence A. 


Proof. Exercise. Oo 


Proposition 11.17 (Compactness). 1. Iff + A then there is 
a finite subset I) CT such that Ip + A. 


2. If every finite subset of I is consistent, then I is consistent. 


Proof. 1. If + A, then there is a derivation 6 of A from I. 
Let I be the set of undischarged assumptions of 6. Since 
any derivation is finite, /) can only contain finitely many 
sentences. So, 6 is a derivation of A from a finite J) CT. 


2. This is the contrapositive of (1) for the special case A = 1. 
oO 


11.8 Derivability and Consistency 


We will now establish a number of properties of the derivability 
relation. They are independently interesting, but each will play 
a role in the proof of the completeness theorem. 
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Proof. Let the derivation of A from I be 6; and the derivation 
of L from I U {A} be 69. We can then derive: 


T, [A]? 
: T 
169 : 
: 701 
1 = aIntro : 
aA Eli 
L ahlmM 


In the new derivation, the assumption A is discharged, so it is 
a derivation from I’. Oo 


Proof. First suppose I+ A, i.e., there is a derivation 69 of A from 
undischarged assumptions /. We obtain a derivation of 1 from 
I U {7A} as follows: 
rT 
60 
oe =Elim 


Now assume I U {=A} is inconsistent, and let 6; be the 
corresponding derivation of 1 from undischarged assumptions 
in [ U {AA}. We obtain a derivation of A from I alone by us- 
ing Lc: 

T,[-A]* 
61 
ail. 


1 te 
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Proof. Suppose [ + A and =A € I. Then there is a derivation 6 
of A from I. Consider this simple application of the =Elim rule: 


F 
7) 


aA 


L AElim 


Since =A € I’, all undischarged assumptions are in I’, this shows 
that [+ 1. O 


Proof. There are derivations 6; and 62 of 1 from TU {A} and 1 
from I U {74}, respectively. We can then derive 


Tr, [54]? T,[ A}? 
; oP) 04 
a aIntro 1 =Intro 
AAA eee. | Si 
L AElim 


Since the assumptions A and —A are discharged, this is a deriva- 
tion of 1 from I alone. Hence I is inconsistent. Oo 


11.9 Derivability and the Propositional 
Connectives 


We establish that the derivability relation + of natural deduction 
is strong enough to establish some basic facts involving the propo- 
sitional connectives, such as that AA B+ Aand A,A-> Bt B 
(modus ponens). These facts are needed for the proof of the 
completeness theorem. 
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Proof. 1. We can derive both 


ANB 


A AElim B AElim 
2. We can derive: 
A B 
AAB AlIntro = 


Proof: 1. Consider the following derivation: 


oA [AP ae as = BE oa. 
AVB a Te aElim a ao aElim 


all 


This is a derivation of . from undischarged assumptions 
AV B, 3A, and 7B. 


2. We can derive both 


AVB VIntro AVB VIntro - 


Proof: 1. We can derive: 


ADB A bis 
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2. This is shown by the following two derivations: 


=A [A]* 
ew fy, 
1 ———. Intro en eee — Intro 
A->B A->B 


Note that —Intro may, but does not have to, discharge the 
assumption A. o 


11.10 Derivability and the Quantifiers 


The completeness theorem also requires that the natural deduc- 
tion rules yield the facts about + established in this section. 


Proof: Let 6 be a derivation of A(c) from I’. By adding a VIntro 
inference, we obtain a derivation of Vx A(x). Since c does not 
occur in I’ or A(x), the eigenvariable condition is satisfied. oO 


Proof. 1. The following is a derivation of 4x A(x) from A(t): 


Sx A(x) ~ 7° 


2. The following is a derivation of A(¢) from Vx A(x): 


Vx A(x) VELi 
A(t) os Oo 
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11.11 Soundness 


A derivation system, such as natural deduction, is sound if it 
cannot derive things that do not actually follow. Soundness is 
thus a kind of guaranteed safety property for derivation systems. 
Depending on which proof theoretic property is in question, we 
would like to know for instance, that 


1. every derivable sentence is valid; 


2. if a sentence is derivable from some others, it is also a 
consequence of them; 


3. if a set of sentences is inconsistent, it is unsatisfiable. 


These are important properties of a derivation system. If any of 
them do not hold, the derivation system is deficient—it would 
derive too much. Consequently, establishing the soundness of a 
derivation system is of the utmost importance. 


Proof: Let 6 be a derivation of A. We proceed by induction on 
the number of inferences in 6. 

For the induction basis we show the claim if the number of 
inferences is 0. In this case, 6 consists only of a single sentence A, 
i.e., an assumption. That assumption is undischarged, since as- 
sumptions can only be discharged by inferences, and there are 
no inferences. So, any structure M that satisfies all of the undis- 
charged assumptions of the proof also satisfies A. 

Now for the inductive step. Suppose that 6 contains n in- 
ferences. The premise(s) of the lowermost inference are derived 
using sub-derivations, each of which contains fewer than n infer- 
ences. We assume the induction hypothesis: The premises of the 
lowermost inference follow from the undischarged assumptions 
of the sub-derivations ending in those premises. We have to show 
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that the conclusion A follows from the undischarged assumptions 
of the entire proof. 

We distinguish cases according to the type of the lowermost 
inference. First, we consider the possible inferences with only 
one premise. 


1. Suppose that the last inference is =Intro: The derivation 
has the form 


F,[A]" 
“51 


n 


7 =Intro 

By inductive hypothesis, 1 follows from the undischarged 
assumptions I U {A} of 61. Consider a structure M. We 
need to show that, if M + I, then M + —A. Suppose for 
reductio that M & I’, but M # A, i.e., M & A. This would 
mean that Mt I U {4}. This is contrary to our inductive 
hypothesis. So, MF =A. 


2. The last inference is AElim: There are two variants: A or 
B may be inferred from the premise A A B. Consider the 
first case. The derivation 6 looks like this: 


T 
‘by 


ALE Elim 
By inductive hypothesis, A A B follows from the undis- 
charged assumptions I of 6;. Consider a structure M. We 
need to show that, if Mt TJ, then Mt A. Suppose MET. 
By our inductive hypothesis (. — A A B), we know that 
MetAAB. By definition, Mt AA BiffMt Aand Me B. 
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(The case where B is inferred from A A B is handled simi- 
larly.) 


3. The last inference is VIntro: There are two variants: A V 
B may be inferred from the premise A or the premise B. 
Consider the first case. The derivation has the form 


r 


Ava VIntro 


By inductive hypothesis, A follows from the undischarged 
assumptions I" of 6;. Consider a structure M. We need to 
show that, if Met I, then Mt AV B. Suppose M £ T; then 
M t A since I’ A (the inductive hypothesis). So it must 
also be the case that M +t AV B. (The case where A V B is 
inferred from B is handled similarly.) 


4. The last inference is —Intro: A — B is inferred from a 
subproof with assumption A and conclusion B, i.e., 


I, [A]” 

“61 
B 

a are — Intro 

By inductive hypothesis, B follows from the undischarged 

assumptions of 6}, i.e., [U{A} & B. Consider a structure M. 

The undischarged assumptions of 6 are just I, since A is 

discharged at the last inference. So we need to show that 

I+ AB. For reductio, suppose that for some structure M, 

MrT but M# A—B. So,Mt AandM ¥ B. But by 


hypothesis, B is a consequence of I U {A}, ie, M & B, 
which is a contradiction. So, [ —§ A > B. 
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5. The last inference is 17: Here, 6 ends in 


By induction hypothesis, [ + .. We have to show that 
I+ A. Suppose not; then for some M we have M § I’ and 
M £ A. But we always have M ¥ _L, so this would mean that 
I ¥ 1, contrary to the induction hypothesis. 


6. The last inference is Lc: Exercise. 

7. The last inference is VIntro: Then 6 has the form 
Tr 
54 


A(a) 

Weltay. VIntro 

The premise A(a) is a consequence of the undischarged 
assumptions I’ by induction hypothesis. Consider some 
structure, M, such that M + I’. We need to show that M & 
Vx A(x). Since Vx A(x) is a sentence, this means we have 
to show that for every variable assignment s, M,s § A(x) 
(Proposition 7.18). Since I’ consists entirely of sentences, 
M,s © B for all B € I’ by Definition 7.11. Let M’ be like 
M except that aM’ = s(x). Since a does not occur in I, 
M’ © T by Corollary 7.20. Since [ & A(a), M’ & A(a). 
Since A(a) is a sentence, M’,s — A(a) by Proposition 7.17. 
M’,s © A(x) iff M’ & A(a) by Proposition 7.22 (recall that 
A(a) is just A(x)[a/x]). So, M’,s & A(x). Since a does not 
occur in A(x), by Proposition 7.19, M,s & A(x). But s was 
an arbitrary variable assignment, so M ¢ Vx A(x). 
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8. The last inference is AIntro: Exercise. 


g. The last inference is VElim: Exercise. 


Now let’s consider the possible inferences with several 
premises: VElim, AIntro, ~Elim, and Elim. 


1. The last inference is AIntro. A A B is inferred from the 
premises A and B and 6 has the form 


ly I9 
‘61169 


A B 
AAB AIntro 


By induction hypothesis, A follows from the undischarged 
assumptions J) of 6; and B follows from the undischarged 
assumptions I) of 69. The undischarged assumptions of 6 
are [| UI, so we have to show that 1 Uy — AAB. Consider 
a structure M with M - I, UTy. Since M & I}, it must be 
the case that M — A as J) — A, and since Me Io, Mt B 
since [> — B. Together, Mt AA B. 


2. The last inference is VElim: Exercise. 


3. The last inference is —Elim. 8B is inferred from the 
premises A — B and A. The derivation 6 looks like this: 


Ty TI 

“5y “59 
A->B A ; 
~~ 3 Elim 


By induction hypothesis, A — B follows from the undis- 
charged assumptions /] of 6] and A follows from the undis- 
charged assumptions J of 69. Consider a structure M. We 
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need to show that, if M & 1; UI, then M & B. Suppose 
Me UI». Since J, + A> B, M+ AB. Since I) § A, we 
have M + A. This means that M + B (For if M ¥ B, since 
M ¢ A, wed have M ¥ A — B, contradicting Mt A— B). 


4. The last inference is ~Elim: Exercise. 


5. The last inference is JElim: Exercise. Oo 


Proof. We prove the contrapositive. Suppose that I’ is not con- 
sistent. Then [+ L, i.e., there is a derivation of L from undis- 
charged assumptions in J. By Theorem 11.27, any structure M 
that satisfies 7’ must satisfy L. Since M ¥ | for every structure M, 
no M can satisfy I, i-e., I is not satisfiable. Oo 


11.12 Derivations with Identity predicate 


Derivations with identity predicate require additional inference 
rules. 


In the above rules, ¢f, 4;, and ¢) are closed terms. The =Intro 
rule allows us to derive any identity statement of the form ¢ = ¢ 
outright, from no assumptions. 


Example 11.30. If s and ¢ are closed terms, then A(s),s = ¢ + 
A(t): 
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s=t A(s) 
A(t) 


=Elim 


This may be familiar as the “principle of substitutability of iden- 
ticals,” or Leibniz’ Law. 


Example 11.31. We derive the sentence 
vx Vy (A(x) A A(y)) > = y) 
from the sentence 


dx Vy (A(y) > y = x) 
We develop the derivation backwards: 


Ax Vy (AQ) > y =x) [A(a) A A(d)] 


: a=b 
((A(@) A A(S)) > a = b) 
vy ((A(a) A AQ)) > a = 9) 
Vx Vy ((A(x) A-AQ)) > = 9) 
We’ll now have to use the main assumption: since it is an existen- 


tial formula, we use SElim to derive the intermediary conclusion 
a=b. 


— Intro 
VIntro 
VIntro 


[Vy (AY) > y = 6)]? 
[A(a) A A(b)]? 


Ax Vy (A ~ =) 
pens ACI da 25) ohm 
1 i= —Intro 
((A(a) \ A(b)) > a = 5) 
VIntro 


Vy (A(@) AAQ)) > @ =) 


Vx Vy ((A(x) AA) >= y) VETO 


CHAPTER 11. NATURAL DEDUCTION 217 


The sub-derivation on the top right is completed by using its 
assumptions to show that a = ¢ and b = c. This requires two 
separate derivations. The derivation for a = c is as follows: 
[Vy (4Q) > y=9) [A(@) AAO). 
A Sask VElim Aa) AElim 
a=C¢ 


Elim 


From a =c and b =¢ we derive a = b by =Elim. 


11.13 Soundness with Identity predicate 


Proof. Any formula of the form ¢ = ¢ is valid, since for every 
structure M, M ¢ ¢ = t. (Note that we assume the term ¢ to be 
closed, i.e., it contains no variables, so variable assignments are 
irrelevant). 

Suppose the last inference in a derivation is =Elim, i.e., the 
derivation has the following form: 


Ty To 
‘51 ) 
tl = ty A(t) : 
Ab) =Elim 


The premises 4 = é and A(t) are derived from undischarged 
assumptions J and 9, respectively. We want to show that A(¢2) 
follows from I U ly. Consider a structure M with M § I UI». 
By induction hypothesis, M + A(t) and M & 4 = fg. There- 
fore, Val! (4) = ValM@ (tg). Let s be any variable assignment, and 
m = Val! (11) = Val (#9). By Proposition 7.22, M,s & A(t) iff 
M,s[m/x] & A(x) iff M,s & A(t). Since M & A(t), we have 
ME A(to). Oo 
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Summary 


Proof systems provide purely syntactic methods for characteriz- 
ing consequence and compatibility between sentences. Natural 
deduction is one such proof system. A derivation in it consists 
of a tree formulas. The topmost formulas in a derivation are as- 
sumptions. All other formulas, for the derivation to be correct, 
must be correctly justified by one of a number of inference rules. 
These come in pairs; an introduction and an elimination rule for 
each connective and quantifier. For instance, if a formula A is 
justified by a Elim rule, the preceding formulas (the premises) 
must be B > A and B (for some B). Some inference rules also 
allow assumptions to be discharged. For instance, if A— B is in- 
ferred from B using — Intro, any occurrences of A as assumptions 
in the derivation leading to the premise B may be discharged, and 
is given a label that is also recorded at the inference. 

If there is a derivation with end formula A and all assumptions 
are discharged, we say A is a theorem and write + A. If all undis- 
charged assumptions are in some set J’, we say A is derivable 
from J and write '+ A. If + 1 we say I is inconsistent, oth- 
erwise consistent. These notions are interrelated, e.g., [+ A iff 
I U{-=A} is inconsistent. They are also related to the correspond- 
ing semantic notions, e.g., if + A then I + A. This property 
of proof systems—what can be derived from I is guaranteed to 
be entailed by /—is called soundness. The soundness theo- 
rem is proved by induction on the length of derivations, showing 
that each individual inference preserves entailment of its conclu- 
sion from open assumptions provided its premises are entailed 
by their undischarged assumptions. 


Problems 


Problem 11.1. Give derivations that show the following: 


1. AA(BAC)E(AAB)AC. 
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2. AV(BVC)E(AVB)VC. 


3. A- (BO C)F BO (AC). 


4. Ab AAA, 


Problem 11.2. Give derivations that show the following: 
1. 
2. 
3. 
4. 
5 
.&A(A> B) > AB. 


10. 
11. 
12. 
Problem 11.3. Give derivations that show the following: 
1. 

2. 

3. 
.FaAnA> A. 


(AV B) —>CFA>C. 

(A> C)A(B>C)+ (AV B) OC. 
+ a(A AA). 

BoAt7AA—-B. 


+ (A> 7A) > AA. 


-A>CHA(AARAC). 
-AANAC+A(A> C). 
_ AV BABE A. 


AAV ABE-A(AA B). 
+ (nA A AB) > A(AV B). 


+ 3A(AV B) > (AAA RB). 


(A — B) + A, 
(AA B) AAV AB. 


A> BtaAAVB. 


._A>B,AA-BtB. 


219 


CHAPTER 11. NATURAL DEDUCTION 220 


6. (AA B) > Ct (A> C)V(B—C). 
7. (A> B)O ALA. 
8. (A> B)V (BC). 
(These all require the L¢ rule.) 
Problem 11.4. Give derivations that show the following: 
1. + (Vx A(x) AVy B(y)) > Vz (A(z) A B(z)). 
2. + (Ax A(x) V Ay B(y)) > Az (A(z) V B(z)). 
3. Vx (A(x) > B) + Ay A(¥y) > B. 
4. Vx 7A(x) + WSx A(x). 
5. t adx A(x) — Vx A(x). 
6. + nAx Vy ((A(x,y) @ 7A(y,y)) A (AA(y,.y) @ A(x,9)))- 
Problem 11.5. Give derivations that show the following: 
1. k aVx A(x) — dx A(x). 
2. (Vx A(x) > B)+ Ay (A(y) > B). 
3. + dx (A(x) > Vy A(y)). 
(These all require the 1¢ rule.) 
Problem 11.6. Prove Proposition 11.16 
Problem 11.7. Prove that [+ 4A iff TU {A} is inconsistent. 
Problem 11.8. Complete the proof of Theorem 11.27. 


Problem 11.9. Prove that = is both symmetric and transitive, 
i.e., give derivations of Vx Vy (x = y > y = x) and Vx Vy Vz((x = 
yAy=2z) x =2) 
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Problem 11.10. Give derivations of the following formulas: 
1. VxVy ((x = 9 A A(x)) > A(Y)) 


2. dx A(x) A VyVz (Ay) A A(z) @ y = 2) 2 AK (A(x) A 
Vy (AQ) > y = *)) 


CHAPTER 12 


The 


Completeness 
Theorem 


12.1 Introduction 


The completeness theorem is one of the most fundamental re- 
sults about logic. It comes in two formulations, the equivalence 
of which we'll prove. In its first formulation it says something fun- 
damental about the relationship between semantic consequence 
and our derivation system: if a sentence A follows from some sen- 
tences I’, then there is also a derivation that establishes [+ A. 
Thus, the derivation system is as strong as it can possibly be 
without proving things that don’t actually follow. 

In its second formulation, it can be stated as a model exis- 
tence result: every consistent set of sentences is satisfiable. Con- 
sistency is a proof-theoretic notion: it says that our derivation 
system is unable to produce certain derivations. But who’s to 
say that just because there are no derivations of a certain sort 
from I, it’s guaranteed that there is a structure M? Before the 
completeness theorem was first proved—in fact before we had the 
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derivation systems we now do—the great German mathematician 
David Hilbert held the view that consistency of mathematical the- 
ories guarantees the existence of the objects they are about. He 
put it as follows in a letter to Gottlob Frege: 


If the arbitrarily given axioms do not contradict one 
another with all their consequences, then they are 
true and the things defined by the axioms exist. This 
is for me the criterion of truth and existence. 


Frege vehemently disagreed. The second formulation of the com- 
pleteness theorem shows that Hilbert was right in at least the 
sense that if the axioms are consistent, then some structure exists 
that makes them all true. 

These aren’t the only reasons the completeness theorem—or 
rather, its proof—is important. It has a number of important con- 
sequences, some of which we'll discuss separately. For instance, 
since any derivation that shows I’ + A is finite and so can only 
use finitely many of the sentences in J’, it follows by the com- 
pleteness theorem that if A is a consequence of J’, it is already 
a consequence of a finite subset of I’. This is called compactness. 
Equivalently, if every finite subset of I’ is consistent, then I itself 
must be consistent. 

Although the compactness theorem follows from the com- 
pleteness theorem via the detour through derivations, it is also 
possible to use the the proof of the completeness theorem to estab- 
lish it directly. For what the proof does is take a set of sentences 
with a certain property—consistency—and constructs a structure 
out of this set that has certain properties (in this case, that it sat- 
isfies the set). Almost the very same construction can be used 
to directly establish compactness, by starting from “finitely sat- 
isfiable” sets of sentences instead of consistent ones. The con- 
struction also yields other consequences, e.g., that any satisfiable 
set of sentences has a finite or countably infinite model. (This 
result is called the L6wenheim-Skolem theorem.) In general, the 
construction of structures from sets of sentences is used often in 
logic, and sometimes even in philosophy. 


CHAPTER 12. THE COMPLETENESS THEOREM 224 


12.2 Outline of the Proof 


The proof of the completeness theorem is a bit complex, and 
upon first reading it, it is easy to get lost. So let us outline the 
proof. The first step is a shift of perspective, that allows us to see 
a route to a proof. When completeness is thought of as “whenever 
I+ Athen /'+ A,” it may be hard to even come up with an idea: 
for to show that I + A we have to find a derivation, and it does 
not look like the hypothesis that [+ A helps us for this in any 
way. For some proof systems it is possible to directly construct 
a derivation, but we will take a slightly different approach. The 
shift in perspective required is this: completeness can also be 
formulated as: “if I is consistent, it is satisfiable.” Perhaps we 
can use the information in together with the hypothesis that it 
is consistent to construct a structure that satisfies every sentence 
in I’. After all, we know what kind of structure we are looking 
for: one that is as I’ describes it! 

If I contains only atomic sentences, it is easy to construct a 
model for it. Suppose the atomic sentences are all of the form 
P(a,,...,@,) where the a; are constant symbols. All we have to 
do is come up with a domain |M| and an assignment for P so 
that M & P(aj,...,a,). But that’s not very hard: put |M| = N, 
og = i, and for every P(aj,...,a,) € I’, put the tuple (Ay,...,kn) 
into PM, where k; is the index of the constant symbol a; (i.e., 
a; = Cx,). 

Now suppose J contains some formula =B, with B atomic. 
We might worry that the construction of M interferes with the 
possibility of making —B true. But here’s where the consistency 
of T comes in: if «B € I, then B ¢ I, or else [ would be 
inconsistent. And if B ¢ I’, then according to our construction 
of M, M ¢ B, so Mt -B. So far so good. 

What if [ contains complex, non-atomic formulas? Say it 
contains A A B. To make that true, we should proceed as if both 
A and B were in I’. And if AV B € I, then we will have to make 
at least one of them true, i.e., proceed as if one of them was in I’. 
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This suggests the following idea: we add additional formulas 
to I’ so as to (a) keep the resulting set consistent and (b) make 
sure that for every possible atomic sentence A, either A is in the 
resulting set, or —A is, and (c) such that, whenever A A B is in 
the set, so are both A and B, if AV B is in the set, at least one of 
A or B is also, etc. We keep doing this (potentially forever). Call 
the set of all formulas so added J. Then our construction above 
would provide us with a structure M for which we could prove, 
by induction, that it satisfies all sentences in *, and hence also 
all sentence in IT since  C J“. It turns out that guaranteeing 
(a) and (b) is enough. A set of sentences for which (b) holds is 
called complete. So our task will be to extend the consistent set 
to a consistent and complete set /™. 

There is one wrinkle in this plan: if dx A(x) € I we would 
hope to be able to pick some constant symbol ¢ and add A(c) 
in this process. But how do we know we can always do that? 
Perhaps we only have a few constant symbols in our language, 
and for each one of them we have =A(c) € I’. We can’t also add 
A(c), since this would make the set inconsistent, and we wouldn’t 
know whether M has to make A(c) or —A(c) true. Moreover, it 
might happen that [’ contains only sentences in a language that 
has no constant symbols at all (e.g., the language of set theory). 

The solution to this problem is to simply add infinitely many 
constants at the beginning, plus sentences that connect them with 
the quantifiers in the right way. (Of course, we have to verify that 
this cannot introduce an inconsistency.) 

Our original construction works well if we only have constant 
symbols in the atomic sentences. But the language might also 
contain function symbols. In that case, it might be tricky to find 
the right functions on N to assign to these function symbols to 
make everything work. So here’s another trick: instead of using 
i to interpret c;, just take the set of constant symbols itself as 
the domain. Then M can assign every constant symbol to itself: 
cM = c,. But why not go all the way: let |M| be all terms of 
the language! If we do this, there is an obvious assignment of 
functions (that take terms as arguments and have terms as values) 
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to function symbols: we assign to the function symbol 7,” the 
function which, given n terms 4, ..., ¢) as input, produces the 
term f,"(4,...,t,) as value. 

The last piece of the puzzle is what to do with =. The 
predicate symbol = has a fixed interpretation: M & ¢ = 0’ iff 
ValM (¢) = Val@(t’). Now if we set things up so that the value of 
a term f¢ is ¢ itself, then this structure will make no sentence of 
the form ¢ = ¢’ true unless ¢ and ¢’ are one and the same term. 
And of course this is a problem, since basically every interesting 
theory in a language with function symbols will have as theorems 
sentences ¢ = t’ where ¢ and ¢’ are not the same term (e.g., in 
theories of arithmetic: (0 + 0) = 0). To solve this problem, we 
change the domain of M: instead of using terms as the objects 
in |M|, we use sets of terms, and each set is so that it contains 
all those terms which the sentences in I’ require to be equal. So, 
e.g., if [is a theory of arithmetic, one of these sets will contain: 
0, (0+ 0), (0 X 0), etc. This will be the set we assign to 0, and it 
will turn out that this set is also the value of all the terms in it, 
e.g., also of (0+ 0). Therefore, the sentence (0 + 0) = 0 will be 
true in this revised structure. 

So here’s what we'll do. First we investigate the properties of 
complete consistent sets, in particular we prove that a complete 
consistent set contains AA B iff it contains both A and B, AV B iff 
it contains at least one of them, etc. (Proposition 12.2). Then we 
define and investigate “saturated” sets of sentences. A saturated 
set is one which contains conditionals that link each quantified 
sentence to instances of it (Definition 12.5). We show that any 
consistent set J’ can always be extended to a saturated set I’ 
(Lemma 12.6). If a set is consistent, saturated, and complete it 
also has the property that it contains 4x A(x) iff it contains A(t) 
for some closed term ¢ and Vx A(x) iff it contains A(¢) for all 
closed terms ¢ (Proposition 12.7). We'll then take the saturated 
consistent set I’ and show that it can be extended to a satu- 
rated, consistent, and complete set /* (Lemma 12.8). This set 
I* is what we'll use to define our term model M(/I). The term 
model has the set of closed terms as its domain, and the interpre- 
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tation of its predicate symbols is given by the atomic sentences 
in [* (Definition 12.9). We'll use the properties of saturated, com- 
plete consistent sets to show that indeed M(J™) & A iff A € I* 
(Lemma 12.12), and thus in particular, M(/™) & I. Finally, we'll 
consider how to define a term model if [’ contains = as well (Def- 
inition 12.16) and show that it satisfies [* (Lemma 12.19). 


12.3 Complete Consistent Sets of Sentences 


Definition 12.1 (Complete set). A set [of sentences is com- 


plete iff for any sentence A, either A¢ I or ~Ae€T. 


Complete sets of sentences leave no questions unanswered. 
For any sentence A, I “says” if A is true or false. The impor- 
tance of complete sets extends beyond the proof of the complete- 
ness theorem. A theory which is complete and axiomatizable, for 
instance, is always decidable. 

Complete consistent sets are important in the completeness 
proof since we can guarantee that every consistent set of sen- 
tences I’ is contained in a complete consistent set [*. A com- 
plete consistent set contains, for each sentence A, either A or its 
negation —A, but not both. This is true in particular for atomic 
sentences, so from a complete consistent set in a language suit- 
ably expanded by constant symbols, we can construct a structure 
where the interpretation of predicate symbols is defined accord- 
ing to which atomic sentences are in J*. This structure can then 
be shown to make all sentences in J* (and hence also all those 
in J’) true. The proof of this latter fact requires that —A ¢ ™ iff 
A¢I*,(AVB) eI" iff Aclr* or Bel", etc. 

In what follows, we will often tacitly use the properties of 
reflexivity, monotonicity, and transitivity of + (see sections 10.8 


and 11.7). 
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Proof: Let us suppose for all of the following that is complete 
and consistent. 


1. If f+ A, then AeT. 


Suppose that [+ A. Suppose to the contrary that A ¢ I. 
Since IT’ is complete, -A ¢€ I. By Propositions 10.20 
and 11.20, I is inconsistent. This contradicts the assump- 
tion that I is consistent. Hence, it cannot be the case that 
A€¢I,soAeTl. 


2. Exercise. 


3. First we show that if A V B € I, then either A € I or 
BeTI. Suppose AV B € I but A ¢ J and B ¢ TI. 
Since ’ is complete, ~A € I and =B ¢€ I. By Proposi- 
tions 10.23 and 11.23, item (1), I’ is inconsistent, a contra- 
diction. Hence, either Ac T or BeT. 


For the reverse direction, suppose that A ¢ [ or Be T. By 
Propositions 10.23 and 11.23, item (2), [+ AV B. By (1), 
Av BeT, as required. 


4. Exercise. Oo 


12.4 Henkin Expansion 


Part of the challenge in proving the completeness theorem is that 
the model we construct from a complete consistent set /’ must 
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make all the quantified formulas in T true. In order to guar- 
antee this, we use a trick due to Leon Henkin. In essence, the 
trick consists in expanding the language by infinitely many con- 
stant symbols and adding, for each formula with one free variable 
A(x) a formula of the form 4x A(x) — A(c), where c is one of the 
new constant symbols. When we construct the structure satisfy- 
ing I’, this will guarantee that each true existential sentence has 
a witness among the new constants. 


Definition 12.4 (Saturated set). A set I!’ of formulas of a lan- 
guage & is saturated iff for each formula A(x) € Frm(Y%) with 


one free variable x there is a constant symbol c € & such that 
dx A(x) > A(c) ET. 


The following definition will be used in the proof of the next 
theorem. 


Definition 12.5. Let L’ be as in Proposition 12.3. Fix an enu- 
meration Ag(xo), A1(%*1), ... of all formulas A;(x;) of L’ in which 
one variable (x;) occurs free. We define the sentences D, by in- 
duction on n. 

Let co be the first constant symbol among the d; we added 


to £ which does not occur in Ap(xo). Assuming that Do, ..., Dp-1 
have already been defined, let c, be the first among the new 
constant symbols d; that occurs neither in Do, ..., Dnj-1 nor 
in A,(Xp). 

Now let D, be the formula 3x, An(x_,) > An(en). 
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Proof. Given a consistent set of sentences I in a language &, ex- 
pand the language by adding a countably infinite set of new con- 
stant symbols to form £’. By Proposition 12.3, I is still consistent 
in the richer language. Further, let D,; be as in Definition 12.5. 
Let 


o=T 
Int =I, VU {Da} 


ie., Inug = FU {Do,...,Dn}, and let P’ = U, Ty. I’ is clearly 
saturated. 

If I’ were inconsistent, then for some n, I, would be incon- 
sistent (Exercise: explain why). So to show that J” is consistent it 
suffices to show, by induction on n, that each set I, is consistent. 

The induction basis is simply the claim that J = I is consis- 
tent, which is the hypothesis of the theorem. For the induction 
step, suppose that /;, is consistent but [41 = I, U {D,} is incon- 
sistent. Recall that D, is dx, An(%n) 2 An(¢n), where A, (xp) is 
a formula of L’ with only the variable x, free. By the way we’ve 
chosen the ¢, (see Definition 12.5), ¢, does not occur in Ay(Xn) 
nor in J}. 

If 7, U {Dy} is inconsistent, then I, + ~D,, and hence both 
of the following hold: 


Py Axy An (Xn) Fy 7Ag (Cn) 


Since ¢, does not occur in I, or in An(x%,), Theorems 10.25 
and 11.25 applies. From [, + 7=A,(¢,), we obtain [, + 
Vxn 7An(x,). Thus we have that both I, + dx, An(%n) and Ty, + 
Vx 7AAy(Xy), SO ly itself is inconsistent. (Note that Vx, 7A,(x_) + 
35x, An(x,).) Contradiction: I, was supposed to be consistent. 
Hence I, U {D,,} is consistent. Oo 


We'll now show that complete, consistent sets which are satu- 
rated have the property that it contains a universally quantified 
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sentence iff it contains all its instances and it contains an existen- 
tially quantified sentence iff it contains at least one instance. We'll 
use this to show that the structure we’ll generate from a complete, 
consistent, saturated set makes all its quantified sentences true. 


Proof: 1. First suppose that 4x A(x) € I. Because I is satu- 
rated, (4x A(x) — A(c)) € I for some constant symbol c. 
By Propositions 10.24 and 11.24, item (1), and Proposi- 
tion 12.2(1), A(c) EF. 


For the other direction, saturation is not necessary: Sup- 
pose A(¢) ¢ I. Then [+ 4x A(x) by Propositions 10.26 
and 11.26, item (1). By Proposition 12.2(1), dx A(x) eT. 


2. Exercise. Oo 


12.5 Lindenbaum’s Lemma 


We now prove a lemma that shows that any consistent set of sen- 
tences is contained in some set of sentences which is not just 
consistent, but also complete. The proof works by adding one 
sentence at a time, guaranteeing at each step that the set remains 
consistent. We do this so that for every A, either A or =A gets 
added at some stage. The union of all stages in that construction 
then contains either A or its negation =A and is thus complete. 
It is also consistent, since we made sure at each stage not to in- 


troduce an inconsistency. 
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Proof. Let I’ be consistent. Let Ao, Ai, ... be an enumeration of 
all the sentences of L. Define [yp = I’, and 


hae I, U{An} if i, U {An} is consistent; 
oi I, U{7AA,} otherwise. 


Let I™ = Unsoln- 

Each I, is consistent: J is consistent by definition. If 
Inu = In, U {An}, this is because the latter is consistent. If it 
isn’t, In41 = I, U {7A}. We have to verify that I, U {=A,} is 
consistent. Suppose it’s not. Then both I, U{A,} and I, U {=A,} 
are inconsistent. This means that [, would be inconsistent by 
Propositions 10.21 and 11.21, contrary to the induction hypothe- 
sis. 

For every n and every i < n, I; C I;. This follows by a simple 
induction on n. For n = 0, there are no i < 0, so the claim holds 
automatically. For the inductive step, suppose it is true for n. 
We have Ing = I, U {An} or = I, U {7An} by construction. So 
In © Ina. If i < n, then F; C I, by inductive hypothesis, and so 
C In4i by transitivity of C. 

From this it follows that every finite subset of [* is a subset 
of I, for some n, since each B € I™ not already in [9 is added at 
some stage i. If n is the last one of these, then all B in the finite 
subset are in I,. So, every finite subset of [* is consistent. By 
Propositions 10.17 and 11.17, /™ is consistent. 

Every sentence of Frm(&) appears on the list used to de- 
fine J*. If A, ¢ I’*, then that is because , U {A,} was inconsis- 
tent. But then =A, € I“, so J™ is complete. Oo 


12.6 Construction of a Model 


Right now we are not concerned about =, i.e., we only want to 
show that a consistent set I’ of sentences not containing = is satis- 
fiable. We first extend I’ to a consistent, complete, and saturated 
set [*. In this case, the definition of a model M(J™) is simple: We 
take the set of closed terms of L’ as the domain. We assign every 
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constant symbol to itself, and make sure that more generally, for 
every closed term t, Val@“") (¢) = t. The predicate symbols are 
assigned extensions in such a way that an atomic sentence is true 
in M(/™) iff it is in 2“. This will obviously make all the atomic 
sentences in /™ true in M(/"). The rest are true provided the I 
we start with is consistent, complete, and saturated. 


Definition 12.9 (Term model). Let [* be a complete and con- 
sistent, saturated set of sentences in a language &. The term 
model M(I*) of I* is the structure defined as follows: 


1. The domain |M(I~)| is the set of all closed terms of &. 


2. The interpretation of a constant symbol c is c itself: 
ge Sg, 


. The function symbol f is assigned the function which, 
given as arguments the closed terms f;, ..., tn, has as value 
the closed term f(é,..., tn): 

r 
PO  Giperat SF Geral) 


. If R is an n-place predicate symbol, then 


(ty. ..stn) € RM] if R(H,...,¢,) € I". 


We will now check that we indeed have Val@"") (4) =t. 


Proof: The proof is by induction on ¢, where the base case, when 
t is a constant symbol, follows directly from the definition of the 
term model. For the induction step assume ¢),...,¢, are closed 
terms such that Val@"") (¢;) = t; and that f is an n-ary function 
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symbol. Then 


ValM Of (tr, stn) = $M (ValM (a), . Wal (ty) 
a ieee ty) 
=F (iiatco stad 


and so by induction this holds for every closed term ¢. Oo 


A structure M may make an existentially quantified sen- 
tence 4x A(x) true without there being an instance A(t) that it 
makes true. A structure M may make all instances A(¢) of a uni- 
versally quantified sentence Vx A(x) true, without making Vx A(x) 
true. This is because in general not every element of |M| is the 
value of a closed term (M may not be covered). This is the rea- 
son the satisfaction relation is defined via variable assignments. 
However, for our term model M(/™) this wouldn’t be necessary— 
because it is covered. This is the content of the next result. 


Proof. 1. By Proposition 7.18, M(I*) & 4x A(x) iff for at least 
one variable assignment s, M(I™“),s § A(x). As |M(U*)| 
consists of the closed terms of &, this is the case iff there is 
at least one closed term ¢ such that s(x) = ¢ and M(I%),s & 
A(x). By Proposition 7.22, M(I™*),s & A(x) iff M(U™),s & 
A(t), where s(x) = ¢. By Proposition 7.17, M(I™*),s & A(t) 
iff M(I™) & A(t), since A(¢) is a sentence. 


2. Exercise. oO 
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Proof. We prove both directions simultaneously, and by induction 
on A. 


1. A= 1: M(I™) # 1 by definition of satisfaction. On the 
other hand, 1 ¢ I* since /™ is consistent. 


2. A= R(h,...,tn): MU*) & R(h,...,t) iff (h,...,tn) € 
RM") (by the definition of satisfaction) iff R(t,...,tp) € 
I™ (by the construction of M(J™)). 


3. A = AB: MU") & A iff MU) # B (by definition of 
satisfaction). By induction hypothesis, M(/™) # B iff B ¢ 
I". Since J“ is consistent and complete, B ¢ I iff =B <« I™. 


4. A= BAC: exercise. 


5 A= BVC: MU") & A iff MU") & Bor MU") F&C 
(by definition of satisfaction) iff B <« I* or C € I™* (by 
induction hypothesis). This is the case iff (BV C) € I™ (by 
Proposition 12.2(3)). 


6. A= B-C: exercise. 
7. A=Vx B(x): exercise. 


8. A = Sx B(x): M(U™*) & A iff MU) & B(t) for at least 
one term ¢ (Proposition 12.11). By induction hypothesis, 
this is the case iff B(¢) € I* for at least one term ¢. By 
Proposition 12.7, this in turn is the case iff dx B(x) € I. 
oO 


12.7 Identity 


The construction of the term model given in the preceding sec- 
tion is enough to establish completeness for first-order logic for 
sets IT that do not contain =. The term model satisfies every 
A €I™* which does not contain = (and hence all A € I’). It does 
not work, however, if = is present. The reason is that /* then 
may contain a sentence ¢ = ¢’, but in the term model the value of 
any term is that term itself. Hence, if ¢ and ¢’ are different terms, 
their values in the term model—i.e., ¢ and ¢’, respectively—are 
different, and so ¢ = t’ is false. We can fix this, however, using a 
construction known as “factoring.” 


Definition 12.13. Let [* be a consistent and complete set of 
sentences in &. We define the relation ~ on the set of closed 
terms of L by 

txé’ iff t=¢ eI" 


Proposition 12.14. The relation ~ has the following properties: 


1. = is reflexive. 


20 


2. = is symmetric. 


is transitive. 


S 
R 


4. Ift ~ t', f is a function symbol, and t, ..., ti-1, titi, -.-5 tn 
are terms, then 


ee oe oie ila ti41,. oa stn) x Fa, oa A ie ete oa stn). 
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Proof: Since I is consistent and complete, ¢ = ¢’ € I* iff + 
t = t’. Thus it is enough to show the following: 


1. ’* + ¢ =¢ for all terms ¢. 
2.ifr*+t=t' thenl*+ ?’ =t. 
se ti biel’ andl* tr sr’, thal tis’, 
4. If F* + t=’, then 
PE Pia eg at iesenacde) = 7 Csecoteet sbeiearnste) 


for every n-place function symbol f and terms h, ..., é;-1, 
Litds ++ +5 bn. 


5. Tf + t= and l* + R(t,..., 4-1, 6, tis1,..-,tn), then 
I*+ R(4,...,4-1,t', tia1,...,tn) for every n-place predicate 
symbol R and terms 4, ..., t-1, tis1, .--5 tn. Oo 


Definition 12.15. Suppose /™ is a consistent and complete set 
in a language &, ¢ is a term, and ~ as in the previous definition. 
Then: 

ile={2 2 € Trm(D),7¢ 2 1'} 


and Trm(2&)/.= {[t]z : ¢ € Trm(L)}. 


Definition 12.16. Let M = M(I%) be the term model for I* 
from Definition 12.9. Then M/z is the following structure: 


1. |M/.| = Trm(2&)/z. 


gg" = [els 


SFr Ca ones =[f(t,.. stn) |= 


4. ([ti]es---s[tnle) € RM/ iff M & R(t,...,t), ie, iff 
R(4,...,tn) € I. 


Note that we have defined f@/ and R™/= for elements of 
Trm(L)/~ by referring to them as [¢]~, i.e., via representatives t € 
[¢]~. We have to make sure that these definitions do not depend 
on the choice of these representatives, i.e., that for some other 
choices t’ which determine the same equivalence classes ([¢]~ = 
[¢’]~), the definitions yield the same result. For instance, if R 
is a one-place predicate symbol, the last clause of the definition 
says that [t]~ € RM! iff M & R(t). If for some other term ¢’ with 
txt’, M¥ R(t), then the definition would require [¢’]~ ¢ RM), 
If ¢ = ¢’, then [¢]~ = [é’]~, but we can’t have both [t]. ¢ RM 
and [é]z ¢ RMF, However, Proposition 12.14 guarantees that 
this cannot happen. 


Proposition 12.17. M/z is well defined, i.e., if ty, ..., tn, ty, ..-5 ty 
are terms, and t; ~ t; then 


Taf (ince ta) eo fe ea Elen Gs 
[it 2 Ye eat) 
and 
2, ME R(t...) ffMe R(t,...,6,), be, 
Ripveaoiyel UR Ee! : 
Proof: Follows from Proposition 12.14 by induction on n. Oo 


As in the case of the term model, before proving the truth 
lemma we need the following lemma. 
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Proof: The proof is similar to that of Lemma 12.10. Oo 


Proof. By induction on A, just as in the proof of Lemma 12.12. 
The only case that needs additional attention is when A = ¢ = ?’. 


M/. + t = t’ iff [t]~ = [t’]~ (by definition of M/z) 
iff ¢ ~ t’ (by definition of [¢]~) 
iff ¢ = ¢’ € I~ (by definition of ~). Oo 


Note that while M(J™) is always countable and infinite, M/~ 
may be finite, since it may turn out that there are only finitely 
many classes [¢]~. This is to be expected, since 7 may contain 
sentences which require any structure in which they are true to 
be finite. For instance, Vx Vy x = y is a consistent sentence, but 
is satisfied only in structures with a domain that contains exactly 
one element. 


12.8 The Completeness Theorem 


Let’s combine our results: we arrive at the completeness theo- 


rem. 


Proof. Suppose I is consistent. By Lemma 12.6, there is a satu- 
rated consistent set [’ > I’. By Lemma 12.8, there is a [* 3 I’ 
which is consistent and complete. Since J’ C I, for each for- 
mula A(x), /* contains a sentence of the form 4x A(x)— A(c) and 
so I™ is saturated. If I does not contain =, then by Lemma 12.12, 
M(I™) & A iff A ¢ r*. From this it follows in particular that for 
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all Ae Tr, MUI) & A, so I is satisfiable. If [ does contain =, 
then by Lemma 12.19, for all sentences A, M/~ + A iff Ae I. In 
particular, M/. © A for all A € I, so I is satisfiable. Oo 


Proof: Note that the Is in Corollary 12.21 and Theorem 12.20 
are universally quantified. To make sure we do not confuse our- 
selves, let us restate Theorem 12.20 using a different variable: for 
any set of sentences 4, if 4 is consistent, it is satisfiable. By con- 
traposition, if 4 is not satisfiable, then 4 is inconsistent. We will 
use this to prove the corollary. 

Suppose that [+ A. Then IU {—4} is unsatisfiable by Propo- 
sition 7.27. Taking [ U {=A} as our J, the previous version of 
Theorem 12.20 gives us that I U {A} is inconsistent. By Propo- 
sitions 10.19 and 11.19, [+ A. Oo 


12.9 The Compactness Theorem 


One important consequence of the completeness theorem is the 
compactness theorem. The compactness theorem states that if 
each finite subset of a set of sentences is satisfiable, the entire 
set is satisfiable—even if the set itself is infinite. This is far from 
obvious. There is nothing that seems to rule out, at first glance at 
least, the possibility of there being infinite sets of sentences which 
are contradictory, but the contradiction only arises, so to speak, 
from the infinite number. The compactness theorem says that 
such a scenario can be ruled out: there are no unsatisfiable infinite 
sets of sentences each finite subset of which is satisfiable. Like the 
completeness theorem, it has a version related to entailment: if an 
infinite set of sentences entails something, already a finite subset 
does. 
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Definition 12.22. A set I’ of formulas is finitely satisfiable iff ev- 
ery finite I C I’ is satisfiable. 


Proof. We prove (2). If I’ is satisfiable, then there is a structure M 
such that M § A for all A € I’. Of course, this M also satisfies 
every finite subset of I’, so I is finitely satisfiable. 

Now suppose that J is finitely satisfiable. Then every finite 
subset [yg C I is satisfiable. By soundness (Corollaries 11.29 
and 10.31), every finite subset is consistent. Then J itself must 
be consistent by Propositions 10.17 and 11.17. By completeness 
(Theorem 12.20), since I" is consistent, it is satisfiable. Oo 


Example 12.24. In every model M of a theory I’, each term ¢ of 
course picks out an element of |M|. Can we guarantee that it is 
also true that every element of |M| is picked out by some term or 
other? In other words, are there theories all models of which 
are covered? The compactness theorem shows that this is not the 
case if I has infinite models. Here’s how to see this: Let M be 
an infinite model of I’, and let c be a constant symbol not in the 
language of I’. Let J be the set of all sentences c # ¢ for ¢ a term 
in the language & of I, ie., 


A={c#t:t€Trm(P)}. 


A finite subset of [ U 4 can be written as I’ U 4’, with Fr’ Cc I 
and 4’ C A. Since 4’ is finite, it can contain only finitely many 
terms. Let a € |M| be an element of |M| not picked out by any 
of them, and let M’ be the structure that is just like M, but also 
cM’ = a. Since a # ValM(t) for all ¢ occuring in 4’, M’ & 4’. 
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Since Mt I, I’ CT, and c does not occur in I’, also M’ § I’. 
Together, M’ ¢ I’ UJ’ for every finite subset I’ U A’ of PUA. So 
every finite subset of I’ U J is satisfiable. By compactness, I U A 
itself is satisfiable. So there are models M + I. U 4. Every such 
M is a model of I, but is not covered, since Val@(c) # ValM(¢) 
for all terms ¢ of &. 


Example 12.25. Consider a language Y containing the predi- 
cate symbol <, constant symbols 0, 1, and function symbols +, 
x, —, +. Let I’ be the set of all sentences in this language true in 
Q with domain Q and the obvious interpretations. I" is the set of 
all sentences of & true about the rational numbers. Of course, 
in Q (and even in R), there are no numbers which are greater 
than 0 but less than 1/é for all k € Z*. Such a number, if it 
existed, would be an infinitesimal: non-zero, but infinitely small. 
The compactness theorem shows that there are models of I in 
which infinitesimals exist: Let A be {0 < c}U{c < (1+k):k € Z*} 
(where k = (1+ (1 +---+(1+1)...)) with & 1’s). For any finite 
subset Ay of A there is a K such that all the sentences ¢ < (1~k) 
in 4p have k < K. If we expand Q to Q’ with c2’ =1/K we have 
that Q’ — I U Ao, and so I U A is finitely satisfiable (Exercise: 
prove this in detail). By compactness, I’ U A is satisfiable. Any 
model S of IU A contains an infinitesimal, namely c°. 


Example 12.26. We know that first-order logic with identity 
predicate can express that the size of the domain must have some 
minimal size: The sentence A;, (which says “there are at least 
n distinct objects”) is true only in structures where |M| has at 
least n objects. So if we take 


A= {As,:n> 1} 


then any model of 4 must be infinite. Thus, we can guarantee that 
a theory only has infinite models by adding A to it: the models 
of [’ U A are all and only the infinite models of I’. 

So first-order logic can express infinitude. The compactness 
theorem shows that it cannot express finitude, however. For sup- 
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pose some set of sentences A were satisfied in all and only finite 
structures. Then 4 U A is finitely satisfiable. Why? Suppose 
Ad’ UA’ C AU Ais finite with 4’ C A and A’ C A. Let n be the 
largest number such that As, € 4’. A, being satisfied in all finite 
structures, has a model M with finitely many but > n elements. 
But then M + 4’ U A’. By compactness, 4 U A has an infinite 
model, contradicting the assumption that A is satisfied only in 
finite structures. 


12.10 A Direct Proof of the Compactness 
Theorem 


We can prove the Compactness Theorem directly, without appeal- 
ing to the Completeness Theorem, using the same ideas as in the 
proof of the completeness theorem. In the proof of the Complete- 
ness Theorem we started with a consistent set J’ of sentences, 
expanded it to a consistent, saturated, and complete set J™ of 
sentences, and then showed that in the term model M(I™*) con- 
structed from /*, all sentences of I are true, so J is satisfiable. 

We can use the same method to show that a finitely satis- 
fiable set of sentences is satisfiable. We just have to prove the 
corresponding versions of the results leading to the truth lemma 
where we replace “consistent” with “finitely satisfiable.” 


Lemma 12.28. Every finitely satisfiable set I can be extended to a 
saturated finitely satisfiable set I’. 


Proposition 12.29. Suppose I" is complete, finitely satisfiable, and 
saturated. 


1. dx A(x) €T iff A(t) € T for at least one closed term t. 


2. Vx A(x) €T iff A(t) € T for all closed terms t. 


Lemma 12.30. Every finitely satisfiable set I can be extended to 
a complete and finitely satisfiable set *. 


Theorem 12.31 (Compactness). I is satisfiable if and only if it 
is finitely satisfiable. 


Proof. If I’ is satisfiable, then there is a structure M such that 
M + A for all A € I. Of course, this M also satisfies every finite 
subset of I’, so I’ is finitely satisfiable. 

Now suppose that J is finitely satisfiable. By Lemma 12.28, 
there is a finitely satisfiable, saturated set 1’ 2 I. By 
Lemma 12.30, I’ can be extended to a complete and finitely 
satisfiable set [*, and J™ is still saturated. Construct the term 
model M(I™) as in Definition 12.9. Note that Proposition 12.11 
did not rely on the fact that /* is consistent (or complete or satu- 
rated, for that matter), but just on the fact that M(/™) is covered. 
The proof of the Truth Lemma (Lemma 12.12) goes through if 
we replace references to Proposition 12.2 and Proposition 12.7 by 
references to Proposition 12.27 and Proposition 12.29 Oo 


12.11 The Léwenheim-Skolem Theorem 


The Léwenheim-Skolem Theorem says that if a theory has an in- 
finite model, then it also has a model that is at most countably 


CHAPTER 12. THE COMPLETENESS THEOREM 245 


infinite. An immediate consequence of this fact is that first-order 
logic cannot express that the size of a structure is uncountable: 
any sentence or set of sentences satisfied in all uncountable struc- 
tures is also satisfied in some countable structure. 


Proof. If I is consistent, the structure M delivered by the proof 
of the completeness theorem has a domain |M| that is no larger 
than the set of the terms of the language %. So M is at most 
countably infinite. Oo 


Proof. If I is consistent and contains no sentences in which iden- 
tity appears, then the structure M delivered by the proof of the 
completness theorem has a domain |M| identical to the set of 
terms of the language £’. So M is countably infinite, since 
Trm(&’) is. Oo 


Example 12.34 (Skolem’s Paradox). Zermelo-Fraenkel __ set 
theory ZFC is a very powerful framework in which practically 
all mathematical statements can be expressed, including facts 
about the sizes of sets. So for instance, ZFC can prove that 
the set R of real numbers is uncountable, it can prove Cantor’s 
Theorem that the power set of any set is larger than the set 
itself, etc. If ZFC is consistent, its models are all infinite, and 
moreover, they all contain elements about which the theory says 
that they are uncountable, such as the element that makes true 
the theorem of ZFC that the power set of the natural numbers 
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exists. By the Lowenheim-Skolem Theorem, ZFC also has count- 
able models—models that contain “uncountable” sets but which 
themselves are countable. 


Summary 


The completeness theorem is the converse of the soundness 
theorem. In one form it states that if [ — A then [+ A, in an- 
other that if I is consistent then it is satisfiable. We proved the 
second form (and derived the first from the second). The proof is 
involved and requires a number of steps. We start with a consis- 
tent set J’. First we add infinitely many new constant symbols ¢; 
as well as formulas of the form 4x A(x) — A(c) where each for- 
mula A(x) with a free variable in the expanded language is paired 
with one of the new constants. This results in a saturated con- 
sistent set of sentences containing /’. It is still consistent. Now 
we take that set and extend it to a complete consistent set. A 
complete consistent set has the nice property that for any sen- 
tence A, either A or —A is in the set (but never both). Since we 
started from a saturated set, we now have a saturated, complete, 
consistent set of sentences /* that includes [’. From this set it 
is now possible to define a structure M such that M(/") & A iff 
A <€I™. In particular, M(J*) & T, i.e., I’ is satisfiable. If = is 
present, the construction is slightly more complex. 

Two important corollaries follow from the completeness theo- 
rem. The compactness theorem states that [ + A iff I) — A 
for some finite Jy C I. An equivalent formulation is that 
is satisfiable iff every finite Jy) C I is satisfiable. The com- 
pactness theorem is useful to prove the existence of structures 
with certain properties. For instance, we can use it to show that 
there are infinite models for every theory which has arbitrarily 
large finite models. This means in particular that finitude can- 
not be expressed in first-order logic. The second corollary, the 
Léwenheim-Skolem Theorem, states that every satisfiable 
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has a countable model. It in turn shows that uncountability can- 
not be expressed in first-order logic. 


Problems 

Problem 12.1. Complete the proof of Proposition 12.2. 
Problem 12.2. Complete the proof of Proposition 12.11. 
Problem 12.3. Complete the proof of Lemma 12.12. 
Problem 12.4. Complete the proof of Proposition 12.14. 
Problem 12.5. Complete the proof of Lemma 12.18. 


Problem 12.6. Use Corollary 12.21 to prove Theorem 12.20, 
thus showing that the two formulations of the completeness the- 
orem are equivalent. 


Problem 12.7. In order for a derivation system to be complete, 
its rules must be strong enough to prove every unsatisfiable set 
inconsistent. Which of the rules of derivation were necessary to 
prove completeness? Are any of these rules not used anywhere 
in the proof? In order to answer these questions, make a list or 
diagram that shows which of the rules of derivation were used in 
which results that lead up to the proof of Theorem 12.20. Be sure 
to note any tacit uses of rules in these proofs. 


Problem 12.8. Prove (1) of Theorem 12.23. 


Problem 12.9. In the standard model of arithmetic N, there is 
no element k € |N| which satisfies every formula n < x (where n 
is o’’ with n /’s). Use the compactness theorem to show that the 
set of sentences in the language of arithmetic which are true in 
the standard model of arithmetic N are also true in a structure N’ 
that contains an element which does satisfy every formula n < x. 
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Problem 12.10. Prove Proposition 12.27. Avoid the use of F. 


Problem 12.11. Prove Lemma 12.28. (Hint: The crucial step is 
to show that if I), is finitely satisfiable, so is IT, U {D,}, without 
any appeal to derivations or consistency.) 


Problem 12.12. Prove Proposition 12.29. 


Problem 12.13. Prove Lemma 12.30. (Hint: the crucial step is 
to show that if I, is finitely satisfiable, then either I, U {An} or 
I, U{7AA,} is finitely satisfiable.) 


Problem 12.14. Write out the complete proof of the Truth 
Lemma (Lemma 12.12) in the version required for the proof of 
Theorem 12.31. 


CHAPTER 13 


Beyond 
First-order 
Logic 


13.1 Overview 


First-order logic is not the only system of logic of interest: there 
are many extensions and variations of first-order logic. A logic 
typically consists of the formal specification of a language, usu- 
ally, but not always, a deductive system, and usually, but not 
always, an intended semantics. But the technical use of the term 
raises an obvious question: what do logics that are not first-order 
logic have to do with the word “logic,” used in the intuitive or 
philosophical sense? All of the systems described below are de- 
signed to model reasoning of some form or another; can we say 
what makes them logical? 

No easy answers are forthcoming. The word “logic” is used 
in different ways and in different contexts, and the notion, like 
that of “truth,” has been analyzed from numerous philosophical 
stances. For example, one might take the goal of logical reason- 
ing to be the determination of which statements are necessarily 
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true, true a priori, true independent of the interpretation of the 
nonlogical terms, true by virtue of their form, or true by linguistic 
convention; and each of these conceptions requires a good deal 
of clarification. Even if one restricts one’s attention to the kind of 
logic used in mathematics, there is little agreement as to its scope. 
For example, in the Principia Mathematica, Russell and Whitehead 
tried to develop mathematics on the basis of logic, in the logicist 
tradition begun by Frege. Their system of logic was a form of 
higher-type logic similar to the one described below. In the end 
they were forced to introduce axioms which, by most standards, 
do not seem purely logical (notably, the axiom of infinity, and 
the axiom of reducibility), but one might nonetheless hold that 
some forms of higher-order reasoning should be accepted as logi- 
cal. In contrast, Quine, whose ontology does not admit “proposi- 
tions” as legitimate objects of discourse, argues that second-order 
and higher-order logic are really manifestations of set theory in 
sheep’s clothing; in other words, systems involving quantification 
over predicates are not purely logical. 

For now, it is best to leave such philosophical issues for a rainy 
day, and simply think of the systems below as formal idealizations 
of various kinds of reasoning, logical or otherwise. 


13.2 Many-Sorted Logic 


In first-order logic, variables and quantifiers range over a single 
domain. But it is often useful to have multiple (disjoint) domains: 
for example, you might want to have a domain of numbers, a do- 
main of geometric objects, a domain of functions from numbers 
to numbers, a domain of abelian groups, and so on. 
Many-sorted logic provides this kind of framework. One 
starts with a list of “sorts’—the “sort” of an object indicates the 
“domain” it is supposed to inhabit. One then has variables and 
quantifiers for each sort, and (usually) an identity predicate for 
each sort. Functions and relations are also “typed” by the sorts 
of objects they can take as arguments. Otherwise, one keeps the 
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usual rules of first-order logic, with versions of the quantifier-rules 
repeated for each sort. 

For example, to study international relations we might choose 
a language with two sorts of objects, French citizens and German 
citizens. We might have a unary relation, “drinks wine,” for ob- 
jects of the first sort; another unary relation, “eats wurst,” for 
objects of the second sort; and a binary relation, “forms a multi- 
national married couple,” which takes two arguments, where the 
first argument is of the first sort and the second argument is of 
the second sort. If we use variables a, b, c to range over French 
citizens and x, y, z to range over German citizens, then 


VaVx[(MarriedT o(a,x)—(DrinksWine(a)VaEatsWurst(x))]] 


asserts that if any French person is married to a German, either 
the French person drinks wine or the German doesn’t eat wurst. 

Many-sorted logic can be embedded in first-order logic in a 
natural way, by lumping all the objects of the many-sorted do- 
mains together into one first-order domain, using unary predi- 
cate symbols to keep track of the sorts, and relativizing quanti- 
fiers. For example, the first-order language corresponding to the 
example above would have unary predicate symbols “German” 
and “French,” in addition to the other relations described, with 
the sort requirements erased. A sorted quantifier Vx A, where x 
is a variable of the German sort, translates to 


Vx (German(x) > A). 


We need to add axioms that insure that the sorts are separate— 
e.g., Vx a(German(x) A French(x))—as well as axioms that guar- 
antee that “drinks wine” only holds of objects satisfying the pred- 
icate French(x), etc. With these conventions and axioms, it is 
not difficult to show that many-sorted sentences translate to first- 
order sentences, and many-sorted derivations translate to first- 
order derivations. Also, many-sorted structures “translate” to cor- 
responding first-order structures and vice-versa, so we also have 
a completeness theorem for many-sorted logic. 
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13.3 Second-Order logic 


The language of second-order logic allows one to quantify not 
just over a domain of individuals, but over relations on that do- 
main as well. Given a first-order language &, for each k one adds 
variables R which range over k-ary relations, and allows quantifi- 
cation over those variables. If R is a variable for a k-ary rela- 
tion, and 4, ..., ¢, are ordinary (first-order) terms, R(4,...,¢,) 
is an atomic formula. Otherwise, the set of formulas is defined 
just as in the case of first-order logic, with additional clauses for 
second-order quantification. Note that we only have the identity 
predicate for first-order terms: if R and S are relation variables 
of the same arity k, we can define R = S' to be an abbreviation 
for 
Vx1 sae VxE (R(x},. we Xp) - S(x1,. os »Xp)). 


The rules for second-order logic simply extend the quanti- 
fier rules to the new second order variables. Here, however, one 
has to be a little bit careful to explain how these variables in- 
teract with the predicate symbols of £, and with formulas of L 
more generally. At the bare minimum, relation variables count 
as terms, so one has inferences of the form 


A(R) + SR A(R) 


But if & is the language of arithmetic with a constant relation 
symbol <, one would also expect the following inference to be 
valid: 


x<ytFRR(x,y) 


or for a given formula A, 
A(x1,...,%%) EIR R(x4,..., x4) 
More generally, we might want to allow inferences of the form 
A[Ax. B(x)/R] + SRA 


where A[Ax. B(x)/R] denotes the result of replacing every atomic 
formula of the form Rt,...,¢, in A by B(4,...,4). This last rule 
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is equivalent to having a comprehension schema, i.e., an axiom of 
the form 


AR Vxy,...,%% (A(41,...,%%) O R(x1,...,%6)), 


one for each formula A in the second-order language, in which 
R is not a free variable. (Exercise: show that if R is allowed to 
occur in A, this schema is inconsistent!) 

When logicians refer to the “axioms of second-order logic” 
they usually mean the minimal extension of first-order logic by 
second-order quantifier rules together with the comprehension 
schema. But it is often interesting to study weaker subsystems of 
these axioms and rules. For example, note that in its full gen- 
erality the axiom schema of comprehension is impredicative: it 
allows one to assert the existence of a relation R(x1,...,x,) that 
is “defined” by a formula with second-order quantifiers; and these 
quantifiers range over the set of all such relations—a set which 
includes R itself! Around the turn of the twentieth century, a com- 
mon reaction to Russell’s paradox was to lay the blame on such 
definitions, and to avoid them in developing the foundations of 
mathematics. If one prohibits the use of second-order quantifiers 
in the formula A, one has a predicative form of comprehension, 
which is somewhat weaker. 

From the semantic point of view, one can think of a second- 
order structure as consisting of a first-order structure for the lan- 
guage, coupled with a set of relations on the domain over which 
the second-order quantifiers range (more precisely, for each k 
there is a set of relations of arity k). Of course, if comprehen- 
sion is included in the derivation system, then we have the added 
requirement that there are enough relations in the “second-order 
part” to satisfy the comprehension axioms—otherwise the deriva- 
tion system is not sound! One easy way to insure that there are 
enough relations around is to take the second-order part to con- 
sist of all the relations on the first-order part. Such a structure is 
called full, and, in a sense, is really the “intended structure” for 
the language. If we restrict our attention to full structures we have 
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what is known as the full second-order semantics. In that case, 
specifying a structure boils down to specifying the first-order part, 
since the contents of the second-order part follow from that im- 
plicitly. 

To summarize, there is some ambiguity when talking about 
second-order logic. In terms of the derivation system, one might 
have in mind either 


1. A “minimal” second-order derivation system, together with 
some comprehension axioms. 


2. The “standard” second-order derivation system, with full 
comprehension. 


In terms of the semantics, one might be interested in either 


1. The “weak” semantics, where a structure consists of a first- 
order part, together with a second-order part big enough 
to satisfy the comprehension axioms. 


2. The “standard” second-order semantics, in which one con- 
siders full structures only. 


When logicians do not specify the derivation system or the se- 
mantics they have in mind, they are usually refering to the second 
item on each list. The advantage to using this semantics is that, 
as we will see, it gives us categorical descriptions of many natural 
mathematical structures; at the same time, the derivation system 
is quite strong, and sound for this semantics. The drawback is 
that the derivation system is not complete for the semantics; in 
fact, no effectively given derivation system is complete for the 
full second-order semantics. On the other hand, we will see that 
the derivation system is complete for the weakened semantics; 
this implies that if a sentence is not provable, then there is some 
structure, not necessarily the full one, in which it is false. 

The language of second-order logic is quite rich. One can 
identify unary relations with subsets of the domain, and so in 
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particular you can quantify over these sets; for example, one can 
express induction for the natural numbers with a single axiom 


VR ((R(0) A Vx (R(x) > R(x’))) 3 Vx R(x)). 


If one takes the language of arithmetic to have symbols 0,/,+, x 
and <, one can add the following axioms to describe their behav- 
ior: 


1. Vxax’ =0 

2. VxVy (s(x) = 5(y) 2x =) 

3. Vx(x+0) =x 

4. WxVy (x+y’) = (x4 y)’ 

5. Wx (xX0) =0 

6. Vx Vy (x x 9’) = (x Xp) +) 
7. Way (x < yO Azy =(x4+2’)) 


It is not difficult to show that these axioms, together with the 
axiom of induction above, provide a categorical description of 
the structure N, the standard model of arithmetic, provided we 
are using the full second-order semantics. Given any structure M 
in which these axioms are true, define a function f from N to the 
domain of M using ordinary recursion on N, so that {(0) = oM 
and f(«+1) =/™(f(x)). Using ordinary induction on N and the 
fact that axioms (1) and (2) hold in M, we see that / is injective. 
To see that f is surjective, let P be the set of elements of |M| 
that are in the range of f. Since M is full, P is in the second- 
order domain. By the construction of f, we know that o™ is in P, 
and that P is closed under ™. The fact that the induction axiom 
holds in M (in particular, for P) guarantees that P is equal to the 
entire first-order domain of M. This shows that f is a bijection. 
Showing that f is a homomorphism is no more difficult, using 
ordinary induction on N repeatedly. 
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In set-theoretic terms, a function is just a special kind of re- 
lation; for example, a unary function f can be identified with a 
binary relation R satisfying Vx d!y R(x,y). As a result, one can 
quantify over functions too. Using the full semantics, one can 
then define the class of infinite structures to be the class of struc- 
tures M for which there is an injective function from the domain 
of M to a proper subset of itself: 


Af (Vx Vy (f(x) = f(y) > « = 9) A Ay Vx f(x) #9). 


The negation of this sentence then defines the class of finite struc- 
tures. 

In addition, one can define the class of well-orderings, by 
adding the following to the definition of a linear ordering: 


VP (ax P(x) > Ax (P(x) AVy (y < x > aAP(y)))). 


This asserts that every non-empty set has a least element, modulo 
the identification of “set” with “one-place relation”. For another 
example, one can express the notion of connectedness for graphs, 
by saying that there is no nontrivial separation of the vertices into 
disconnected parts: 


aH A (Ax A(x) A dy 7A(y) A Vw Vz ((A(w) A A(z) @ AR(w,2))). 


For yet another example, you might try as an exercise to define 
the class of finite structures whose domain has even size. More 
strikingly, one can provide a categorical description of the real 
numbers as a complete ordered field containing the rationals. 
In short, second-order logic is much more expressive than 
first-order logic. That’s the good news; now for the bad. We have 
already mentioned that there is no effective derivation system 
that is complete for the full second-order semantics. For better 
or for worse, many of the properties of first-order logic are absent, 
including compactness and the Léwenheim-Skolem theorems. 
On the other hand, if one is willing to give up the full second- 
order semantics in terms of the weaker one, then the minimal 
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second-order derivation system is complete for this semantics. In 
other words, if we read + as “proves in the minimal system” and § 
as “logically implies in the weaker semantics”, we can show that 
whenever J + A then J’ + A. If one wants to include specific 
comprehension axioms in the derivation system, one has to re- 
strict the semantics to second-order structures that satisfy these 
axioms: for example, if 4 consists of a set of comprehension 
axioms (possibly all of them), we have that if [ U 4 & A, then 
I UAt A. In particular, if A is not provable using the compre- 
hension axioms we are considering, then there is a model of =A 
in which these comprehension axioms nonetheless hold. 

The easiest way to see that the completeness theorem holds 
for the weaker semantics is to think of second-order logic as a 
many-sorted logic, as follows. One sort is interpreted as the ordi- 
nary “first-order” domain, and then for each k we have a domain 
of “relations of arity k.” We take the language to have built-in 


relation symbols “true,(R,x1,...,*,)” which is meant to assert 
that R holds of x, ..., x, where R is a variable of the sort “k-ary 
relation” and xj, ..., x, are objects of the first-order sort. 


With this identification, the weak second-order semantics is 
essentially the usual semantics for many-sorted logic; and we have 
already observed that many-sorted logic can be embedded in first- 
order logic. Modulo the translations back and forth, then, the 
weaker conception of second-order logic is really a form of first- 
order logic in disguise, where the domain contains both “objects” 
and “relations” governed by the appropriate axioms. 


13.4 Higher-Order logic 


Passing from first-order logic to second-order logic enabled us 
to talk about sets of objects in the first-order domain, within the 
formal language. Why stop there? For example, third-order logic 
should enable us to deal with sets of sets of objects, or perhaps 
even sets which contain both objects and sets of objects. And 
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fourth-order logic will let us talk about sets of objects of that kind. 
As you may have guessed, one can iterate this idea arbitrarily. 
In practice, higher-order logic is often formulated in terms 
of functions instead of relations. (Modulo the natural identifica- 
tions, this difference is inessential.) Given some basic “sorts” A, 
B, C, ... (which we will now call “types”), we can create new ones 


by stipulating 
If o and 7 are finite types then so is 7 — T. 


Think of types as syntactic “labels,” which classify the objects 
we want in our domain; 7 — T describes those objects that are 
functions which take objects of type o to objects of type t. For 
example, we might want to have a type 2 of truth values, “true” 
and “false,” and a type N of natural numbers. In that case, you 
can think of objects of type N — Q as unary relations, or sub- 
sets of N; objects of type N — N are functions from natural nu- 
mers to natural numbers; and objects of type (N — N) — N are 
“functionals,” that is, higher-type functions that take functions to 
numbers. 

As in the case of second-order logic, one can think of higher- 
order logic as a kind of many-sorted logic, where there is a sort for 
each type of object we want to consider. But it is usually clearer 
just to define the syntax of higher-type logic from the ground up. 
For example, we can define a set of finite types inductively, as 
follows: 


1. N is a finite type. 
2. If o and 7 are finite types, then so is 7 — T. 
3. If o and 7 are finite types, so is a XT. 


Intuitively, N denotes the type of the natural numbers, 7 — T 
denotes the type of functions from o to t, and o XT denotes the 
type of pairs of objects, one from o and one from tT. We can then 
define a set of terms inductively, as follows: 
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1. For each type o, there is a stock of variables x, y, z, ...of 
type o 


2. Ois a term of type N 
3. S (successor) is a term of type N > N 


4. If s is a term of type o, and ¢ is a term of type N — (0 > 
a), then Rs; is a term of type N > o 


5. If s is a term of type tT > o and ¢ is a term of type T, then 
s(t) is a term of type o 


6. If s is a term of type o and x is a variable of type T, then 
Ax.s is a term of type T > c. 


7. If s is a term of type o and ¢ is a term of type T, then (s,¢) 
is a term of type o XT. 


8. If s is a term of type o XT then f(s) is a term of type 7 
and f(s) is a term of type T. 


Intuitively, R;; denotes the function defined recursively by 


Rs:(0) =5S 
Ryt(x +1) = t(x,Rst(x)), 


(s,¢) denotes the pair whose first component is s and whose sec- 
ond component is ¢, and f(s) and fo(s) denote the first and 
second elements (“projections”) of s. Finally, Ax.s denotes the 
function f defined by 


f(x) =s 


for any x of type o; so item (6) gives us a form of comprehension, 
enabling us to define functions using terms. Formulas are built 
up from identity predicate statements s = ¢ between terms of the 
same type, the usual propositional connectives, and higher-type 
quantification. One can then take the axioms of the system to be 
the basic equations governing the terms defined above, together 
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with the usual rules of logic with quantifiers and identity predi- 
cate. 

If one augments the finite type system with a type Q of truth 
values, one has to include axioms which govern its use as well. In 
fact, if one is clever, one can get rid of complex formulas entirely, 
replacing them with terms of type Q! The proof system can then 
be modified accordingly. The result is essentially the simple theory 
of types set forth by Alonzo Church in the 1930s. 

As in the case of second-order logic, there are different ver- 
sions of higher-type semantics that one might want to use. In the 
full version, variables of type 0 — T range over the set of all 
functions from the objects of type o to objects of type tT. As you 
might expect, this semantics is too strong to admit a complete, 
effective derivation system. But one can consider a weaker se- 
mantics, in which a structure consists of sets of elements 7; for 
each type T, together with appropriate operations for application, 
projection, etc. If the details are carried out correctly, one can 
obtain completeness theorems for the kinds of derivation systems 
described above. 

Higher-type logic is attractive because it provides a frame- 
work in which we can embed a good deal of mathematics in a 
natural way: starting with N, one can define real numbers, con- 
tinuous functions, and so on. It is also particularly attractive in 
the context of intuitionistic logic, since the types have clear “con- 
structive” intepretations. In fact, one can develop constructive 
versions of higher-type semantics (based on intuitionistic, rather 
than classical logic) that clarify these constructive interpretations 
quite nicely, and are, in many ways, more interesting than the 
classical counterparts. 


13.5 Intuitionistic Logic 


In constrast to second-order and higher-order logic, intuitionistic 
first-order logic represents a restriction of the classical version, 
intended to model a more “constructive” kind of reasoning. The 
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following examples may serve to illustrate some of the underlying 
motivations. 

Suppose someone came up to you one day and announced 
that they had determined a natural number x, with the property 
that if x is prime, the Riemann hypothesis is true, and if x is com- 
posite, the Riemann hypothesis is false. Great news! Whether the 
Riemann hypothesis is true or not is one of the big open ques- 
tions of mathematics, and here they seem to have reduced the 
problem to one of calculation, that is, to the determination of 
whether a specific number is prime or not. 

What is the magic value of x? They describe it as follows: x is 
the natural number that is equal to 7 if the Riemann hypothesis 
is true, and 9 otherwise. 

Angrily, you demand your money back. From a classical point 
of view, the description above does in fact determine a unique 
value of x; but what you really want is a value of x that is given 
explicitly. 

To take another, perhaps less contrived example, consider 
the following question. We know that it is possible to raise an 
irrational number to a rational power, and get a rational result. 
For example, v2 = 2. What is less clear is whether or not it is 
possible to raise an irrational number to an irrational power, and 
get a rational result. The following theorem answers this in the 
affirmative: 


v2 
Proof. Consider V2. . If this is rational, we are done: we can let 
a= 6b = V2. Otherwise, it is irrational. Then we have 


(va?) = Phen = 2° = 2, 


2 
which is certainly rational. So, in this case, let a be ya”, and let 


b be V2. Oo 
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Does this constitute a valid proof? Most mathematicians feel 
that it does. But again, there is something a little bit unsatisfying 
here: we have proved the existence of a pair of real numbers 
with a certain property, without being able to say which pair of 
numbers it is. It is possible to prove the same result, but in such 
a way that the pair a, b is given in the proof: take a = V3 and 
b = log, 4. Then 


a! = 38s 4 = 31/2-log 4 ~ (31° ails = 4/2 = 2, 


since 3!°83* = x, 

Intuitionistic logic is designed to model a kind of reasoning 
where moves like the one in the first proof are disallowed. Proving 
the existence of an x satisfying A(x) means that you have to give a 
specific x, and a proof that it satisfies A, like in the second proof. 
Proving that A or B holds requires that you can prove one or the 
other. 

Formally speaking, intuitionistic first-order logic is what you 
get if you restrict a derivation system for first-order logic in a 
certain way. Similarly, there are intuitionistic versions of second- 
order or higher-order logic. From the mathematical point of view, 
these are just formal deductive systems, but, as already noted, 
they are intended to model a kind of mathematical reasoning. 
One can take this to be the kind of reasoning that is justified on 
a certain philosophical view of mathematics (such as Brouwer’s 
intuitionism); one can take it to be a kind of mathematical rea- 
soning which is more “concrete” and satisfying (along the lines 
of Bishop’s constructivism); and one can argue about whether or 
not the formal description captures the informal motivation. But 
whatever philosophical positions we may hold, we can study in- 
tuitionistic logic as a formally presented logic; and for whatever 
reasons, many mathematical logicians find it interesting to do so. 

There is an informal constructive interpretation of the intu- 
itionist connectives, usually known as the BHK interpretation 
(named after Brouwer, Heyting, and Kolmogorov). It runs as 
follows: a proof of A A B consists of a proof of A paired with a 
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proof of B; a proof of A Vv B consists of either a proof of A, or a 
proof of B, where we have explicit information as to which is the 
case; a proof of A— B consists of a procedure, which transforms 
a proof of A to a proof of B; a proof of Vx A(x) consists of a proce- 
dure which returns a proof of A(x) for any value of x; and a proof 
of 4x A(x) consists of a value of x, together with a proof that this 
value satisfies A. One can describe the interpretation in compu- 
tational terms known as the “Curry-Howard isomorphism” or the 
“formulas-as-types paradigm”: think of a formula as specifying a 
certain kind of data type, and proofs as computational objects 
of these data types that enable us to see that the corresponding 
formula is true. 

Intuitionistic logic is often thought of as being classical logic 
“minus” the law of the excluded middle. This following theorem 
makes this more precise. 


Obtaining instances of one schema from either of the others is a 
good exercise in intuitionistic logic. 

The first deductive systems for intuitionistic propositional 
logic, put forth as formalizations of Brouwer’s intuitionism, are 
due, independently, to Kolmogorov, Glivenko, and Heyting. The 
first formalization of intuitionistic first-order logic (and parts of 
intuitionist mathematics) is due to Heyting. Though a number 
of classically valid schemata are not intuitionistically valid, many 
are. 

The double-negation translation describes an important rela- 
tionship between classical and intuitionist logic. It is defined in- 
ductively follows (think of A’ as the “intuitionist” translation of 
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the classical formula A): 


AN =A for atomic formulas A 
(AA BY = (AN 2 BY) 
(Av B)N =-=-(AN v BY) 
(A> B)* = (A" > B®) 
(Wx A)N = Vx AN 
(ax A)N = a73x AN 


Kolmogorov and Glivenko had versions of this translation for 
propositional logic; for predicate logic, it is due to Gédel and 
Gentzen, independently. We have 


We can now envision the following dialogue. Classical math- 
ematician: “I’ve proved A!” Intuitionist mathematician: “Your 
proof isn’t valid. What you’ve really proved is A%.” Classical 
mathematician: “Fine by me!” As far as the classical mathemati- 
cian is concerned, the intuitionist is just splitting hairs, since the 
two are equivalent. But the intuitionist insists there is a differ- 
ence. 

Note that the above translation concerns pure logic only; it 
does not address the question as to what the appropriate nonlog- 
ical axioms are for classical and intuitionistic mathematics, or 
what the relationship is between them. But the following slight 
extension of the theorem above provides some useful informa- 
tion: 


In other words, if A is provable from some hypotheses classi- 
cally, then A is provable from their double-negation translations. 
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To show that a sentence or propositional formula is intuition- 
istically valid, all you have to do is provide a proof. But how can 
you show that it is not valid? For that purpose, we need a seman- 
tics that is sound, and preferrably complete. A semantics due to 
Kripke nicely fits the bill. 

We can play the same game we did for classical logic: de- 
fine the semantics, and prove soundness and completeness. It 
is worthwhile, however, to note the following distinction. In the 
case of classical logic, the semantics was the “obvious” one, in 
a sense implicit in the meaning of the connectives. Though one 
can provide some intuitive motivation for Kripke semantics, the 
latter does not offer the same feeling of inevitability. In addi- 
tion, the notion of a classical structure is a natural mathematical 
one, so we can either take the notion of a structure to be a tool 
for studying classical first-order logic, or take classical first-order 
logic to be a tool for studying mathematical structures. In con- 
trast, Kripke structures can only be viewed as a logical construct; 
they don’t seem to have independent mathematical interest. 

A Kripke structure Jt = (W,R,V) for a propositional lan- 
guage consists of a set W, partial order R on W with a least ele- 
ment, and an “monotone” assignment of propositional variables 
to the elements of W. The intuition is that the elements of W 
represent “worlds,” or “states of knowledge”; an element v > u 
represents a “possible future state” of u; and the propositional 
variables assigned to u are the propositions that are known to be 
true in state u. The forcing relation Jt,w A then extends this 
relationship to arbitrary formulas in the language; read Mt, w A 
as “A is true in state w.” The relationship is defined inductively, 
as follows: 


1. Mi,w tt p; iff p; is one of the propositional variables as- 
signed to w. 


2. Miw Kk 1. 


3. Mi,w tt (AA B) iff Dt,w t A and Mt,w t B. 
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4. Mt,w tt (AV B) iff Mt,w t A or Mw t B. 


5. Dt,w it (A > B) iff, whenever w’ > w and Mi,w’ tt A, then 
M, w’  B. 


It is a good exercise to try to show that =(p A q) — (=p V 779) is 
not intuitionistically valid, by cooking up a Kripke structure that 
provides a counterexample. 


13.6 Modal Logics 
Consider the following example of a conditional sentence: 


If Jeremy is alone in that room, then he is drunk and 
naked and dancing on the chairs. 


This is an example of a conditional assertion that may be mate- 
rially true but nonetheless misleading, since it seems to suggest 
that there is a stronger link between the antecedent and conclu- 
sion other than simply that either the antecedent is false or the 
consequent true. That is, the wording suggests that the claim is 
not only true in this particular world (where it may be trivially 
true, because Jeremy is not alone in the room), but that, more- 
over, the conclusion would have been true had the antecedent 
been true. In other words, one can take the assertion to mean 
that the claim is true not just in this world, but in any “possible” 
world; or that it is necessarily true, as opposed to just true in this 
particular world. 

Modal logic was designed to make sense of this kind of ne- 
cessity. One obtains modal propositional logic from ordinary 
propositional logic by adding a box operator; which is to say, if A 
is a formula, so is OA. Intuitively, OA asserts that A is necessarily 
true, or true in any possible world. A is usually taken to be 
an abbreviation for -O—A, and can be read as asserting that A is 
possibly true. Of course, modality can be added to predicate logic 
as well. 
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Kripke structures can be used to provide a semantics for 
modal logic; in fact, Kripke first designed this semantics with 
modal logic in mind. Rather than restricting to partial orders, 
more generally one has a set of “possible worlds,” P, and a bi- 
nary “accessibility” relation R(x,y) between worlds. Intuitively, 
R(p,q) asserts that the world g is compatible with 9; i.e., if we are 
“in” world p, we have to entertain the possibility that the world 
could have been like q. 

Modal logic is sometimes called an “intensional” logic, as op- 
posed to an “extensional” one. The intended semantics for an 
extensional logic, like classical logic, will only refer to a single 
world, the “actual” one; while the semantics for an “intensional” 
logic relies on a more elaborate ontology. In addition to structure- 
ing necessity, one can use modality to structure other linguistic 
constructions, reinterpreting 0 and © according to the applica- 
tion. For example: 


1. In provability logic, OA is read “A is provable” and 4A is 
read “A is consistent.” 


2. In epistemic logic, one might read OA as “I know A” or “I 
believe A.” 


3. In temporal logic, one can read OA as “A is always true” 
and ©A as “A is sometimes true.” 


One would like to augment logic with rules and axioms deal- 
ing with modality. For example, the system S4 consists of the 
ordinary axioms and rules of propositional logic, together with 
the following axioms: 


on(A > B) > (OA > OB) 
oA—A 
oA —> ooOA 
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as well as a rule, “from A conclude OA.” S5 adds the following 
axiom: 


©A>0OA 


Variations of these axioms may be suitable for different applica- 
tions; for example, S5 is usually taken to characterize the notion 
of logical necessity. And the nice thing is that one can usually 
find a semantics for which the derivation system is sound and 
complete by restricting the accessibility relation in the Kripke 
structures in natural ways. For example, S4 corresponds to the 
class of Kripke structures in which the accessibility relation is 
reflexive and transitive. S5 corresponds to the class of Kripke 
structures in which the accessibility relation is universal, which 
is to say that every world is accessible from every other; so 0A 
holds if and only if A holds in every world. 


13.7. Other Logics 


As you may have gathered by now, it is not hard to design a new 
logic. You too can create your own a syntax, make up a deductive 
system, and fashion a semantics to go with it. You might have to 
be a bit clever if you want the derivation system to be complete 
for the semantics, and it might take some effort to convince the 
world at large that your logic is truly interesting. But, in return, 
you can enjoy hours of good, clean fun, exploring your logic’s 
mathematical and computational properties. 

Recent decades have witnessed a veritable explosion of for- 
mal logics. Fuzzy logic is designed to model reasoning about 
vague properties. Probabilistic logic is designed to model reason- 
ing about uncertainty. Default logics and nonmonotonic logics 
are designed to model defeasible forms of reasoning, which is to 
say, “reasonable” inferences that can later be overturned in the 
face of new information. There are epistemic logics, designed 
to model reasoning about knowledge; causal logics, designed to 
model reasoning about causal relationships; and even “deontic” 
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logics, which are designed to model reasoning about moral and 
ethical obligations. Depending on whether the primary motiva- 
tion for introducing these systems is philosophical, mathematical, 
or computational, you may find such creatures studies under the 
rubric of mathematical logic, philosophical logic, artificial intel- 
ligence, cognitive science, or elsewhere. 

The list goes on and on, and the possibilities seem endless. 
We may never attain Leibniz’ dream of reducing all of human 
reason to calculation—but that can’t stop us from trying. 


PART Ill 


luring 
Machines 


luring 
Machine 
Computations 


14.1 Introduction 


What does it mean for a function, say, from N to N to be com- 
putable? Among the first answers, and the most well known one, 
is that a function is computable if it can be computed by a Tur 
ing machine. This notion was set out by Alan Turing in 1936. 
Turing machines are an example of a model of computation—they 
are a mathematically precise way of defining the idea of a “com- 
putational procedure.” What exactly that means is debated, but 
it is widely agreed that Turing machines are one way of speci- 
fying computational procedures. Even though the term “Turing 
machine” evokes the image of a physical machine with moving 
parts, strictly speaking a Turing machine is a purely mathemat- 
ical construct, and as such it idealizes the idea of a computa- 
tional procedure. For instance, we place no restriction on either 
the time or memory requirements of a Turing machine: Turing 
machines can compute something even if the computation would 
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Figure 14.1: A Turing machine executing its program. 


require more storage space or more steps than there are atoms in 
the universe. 

It is perhaps best to think of a Turing machine as a program 
for a special kind of imaginary mechanism. This mechanism con- 
sists of a tape and a read-write head. In our version of Turing ma- 
chines, the tape is infinite in one direction (to the right), and it 
is divided into squares, each of which may contain a symbol from 
a finite alphabet. Such alphabets can contain any number of dif- 
ferent symbols, but we will mainly make do with three: >, Ll, and 
I. When the mechanism is started, the tape is empty (ie., each 
square contains the symbol L!) except for the leftmost square, 
which contains >, and a finite number of squares which contain 
the input. At any time, the mechanism is in one of a finite number 
of states. At the outset, the head scans the leftmost square and in 
a specified initial state. At each step of the mechanism’s run, the 
content of the square currently scanned together with the state 
the mechanism is in and the Turing machine program determine 
what happens next. The Turing machine program is given by a 
partial function which takes as input a state g and a symbol 7 
and outputs a triple (q’,o’,D). Whenever the mechanism is in 
state g and reads symbol o,, it replaces the symbol on the current 
square with o’, the head moves left, right, or stays put according 
to whether D is L, R, or N, and the mechanism goes into state q’. 

For instance, consider the situation in Figure 14.1. The visible 
part of the tape of the Turing machine contains the end-of-tape 
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symbol > on the leftmost square, followed by three 1’s, a 0, and 
four more 1’s. The head is reading the third square from the left, 
which contains a 1, and is in state gj—we say “the machine is 
reading a 1 in state q).” If the program of the Turing machine 
returns, for input (q1,1), the triple (¢2,0,N), the machine would 
now replace the 1 on the third square with a 0, leave the read/write 
head where it is, and switch to state go. If then the program re- 
turns (9¢3,0,R) for input (g2,0), the machine would now overwrite 
the 0 with another 0 (effectively, leaving the content of the tape 
under the read/write head unchanged), move one square to the 
right, and enter state g3. And so on. 

We say that the machine /alts when it encounters some state, 
gn, and symbol, o such that there is no instruction for (qn,), 
i.e., the transition function for input (¢,,0) is undefined. In other 
words, the machine has no instruction to carry out, and at that 
point, it ceases operation. Halting is sometimes represented by 
a specific halt state 4. This will be demonstrated in more detail 
later on. 

The beauty of Turing’s paper, “On computable numbers,” 
is that he presents not only a formal definition, but also an ar- 
gument that the definition captures the intuitive notion of com- 
putability. From the definition, it should be clear that any func- 
tion computable by a Turing machine is computable in the intu- 
itive sense. Turing offers three types of argument that the con- 
verse is true, i.e., that any function that we would naturally regard 
as computable is computable by such a machine. They are (in 
Turing’s words): 


1. A direct appeal to intuition. 


2. A proof of the equivalence of two definitions (in case the 
new definition has a greater intuitive appeal). 


3. Giving examples of large classes of numbers which are com- 


putable. 


Our goal is to try to define the notion of computability “in prin- 
ciple,” i.e., without taking into account practical limitations of 
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time and space. Of course, with the broadest definition of com- 
putability in place, one can then go on to consider computation 
with bounded resources; this forms the heart of the subject known 
as “computational complexity.” 


14.2 Representing Turing Machines 


Turing machines can be represented visually by state diagrams. 
The diagrams are composed of state cells connected by arrows. 
Unsurprisingly, each state cell represents a state of the machine. 
Each arrow represents an instruction that can be carried out from 
that state, with the specifics of the instruction written above or 
below the appropriate arrow. Consider the following machine, 
which has only two internal states, gg and 1, and one instruction: 


U,I,R 
start — 


Recall that the Turing machine has a read/write head and a tape 
with the input written on it. The instruction can be read as if 
reading a | in state qo, write a I, move right, and move to state qy. 
This is equivalent to the transition function mapping (qo,L!) to 
(n.1,R). 


Example 14.1. Even Machine: The following Turing machine 
halts if, and only if, there are an even number of /’s on the tape 
(under the assumption that all J’s come before the first L| on the 


tape). 
U,U,R 
I,I,R 
start aC 
I,I,R 
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The state diagram corresponds to the following transition 
function: 


6(go,1) =(n.1,R), 
6(q1.1) = (go.1,R), 
6(q,U) = (m,U, R) 


The above machine halts only when the input is an even num- 
ber of strokes. Otherwise, the machine (theoretically) continues 
to operate indefinitely. For any machine and input, it is possi- 
ble to trace through the configurations of the machine in order to 
determine the output. We will give a formal definition of config- 
urations later. For now, we can intuitively think of configurations 
as a series of diagrams showing the state of the machine at any 
point in time during operation. Configurations show the con- 
tent of the tape, the state of the machine and the location of the 
read/write head. 

Let us trace through the configurations of the even machine 
if it is started with an input of four J’s. In this case, we expect 
that the machine will halt. We will then run the machine on an 
input of three /’s, where the machine will run forever. 

The machine starts in state go, scanning the leftmost J. We 
can represent the initial state of the machine as follows: 


bIoplITuU... 


The above configuration is straightforward. As can be seen, the 
machine starts in state one, scanning the leftmost J. This is rep- 
resented by a subscript of the state name on the first J. The 
applicable instruction at this point is 6(g0,/) = (q1,/,R), and so 
the machine moves right on the tape and changes to state q. 


rPIAITU... 


Since the machine is now in state g; scanning a J, we have to 
“follow” the instruction 6(q1,/) = (qo,/,R). This results in the 
configuration 

rPIIhIU... 
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As the machine continues, the rules are applied again in the same 
order, resulting in the following two configurations: 


bPITTAU... 


bITIT Ug... 


The machine is now in state gg scanning a LI. Based on the tran- 
sition diagram, we can easily see that there is no instruction to 
be carried out, and thus the machine has halted. This means that 
the input has been accepted. 

Suppose next we start the machine with an input of three J’s. 
The first few configurations are similar, as the same instructions 
are carried out, with only a small difference of the tape input: 


rPIoITuU... 


rPIATU... 


rPITIgpuU... 
rpITTy... 


The machine has now traversed past all the /’s, and is reading 
a LU in state g;. As shown in the diagram, there is an instruction 
of the form 6(q1,U) = (q1,U,R). Since the tape is filled with LU 
indefinitely to the right, the machine will continue to execute this 
instruction forever, staying in state g, and moving ever further to 
the right. The machine will never halt, and does not accept the 
input. 

It is important to note that not all machines will halt. If halt- 
ing means that the machine runs out of instructions to execute, 
then we can create a machine that never halts simply by ensuring 
that there is an outgoing arrow for each symbol at each state. 
The even machine can be modified to run indefinitely by adding 
an instruction for scanning a LI at qo. 
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L,uU,R U,U,R 
I,I,R 

start —(#) (a) 
I,I,R 


? 


Example 14.2. 


Machine tables are another way of representing Turing ma- 
chines. Machine tables have the tape alphabet displayed on the 
x-axis, and the set of machine states across the y-axis. Inside the 
table, at the intersection of each state and symbol, is written the 
rest of the instruction—the new state, new symbol, and direc- 
tion of movement. Machine tables make it easy to determine in 
what state, and for what symbol, the machine halts. Whenever 
there is a gap in the table is a possible point for the machine to 
halt. Unlike state diagrams and instruction sets, where the points 
at which the machine halts are not always immediately obvious, 
any halting points are quickly identified by finding the gaps in 
the machine table. 


Example 14.3. The machine table for the even machine is: 


LI I > 
qo I,m,R 
m | U,g.R | L,q0,Rk 


As we can see, the machine halts when scanning a L in state go. 


So far we have only considered machines that read and accept 
input. However, Turing machines have the capacity to both read 
and write. An example of such a machine (although there are 
many, many examples) is a doubler. A doubler, when started with 
a block of n J’s on the tape, outputs a block of 2n J’s. 


Example 14.4. Before building a doubler machine, it is impor 
tant to come up with a strategy for solving the problem. Since the 
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I,I,R I,I,R 
Cw T,u,R o LU, R (2) 
start — 
U,Z,R 


U,U,R 


(4) BA ie we I,I,L C 
LAL LiL ay a 


Figure 14.2: A doubler machine 


machine (as we have formulated it) cannot remember how many 
I’s it has read, we need to come up with a way to keep track of all 
the J/’s on the tape. One such way is to separate the output from 
the input with a LU. The machine can then erase the first J from 
the input, traverse over the rest of the input, leave a LI, and write 
two new I’s. The machine will then go back and find the second 
I in the input, and double that one as well. For each one IJ of 
input, it will write two J’s of output. By erasing the input as the 
machine goes, we can guarantee that no J is missed or doubled 
twice. When the entire input is erased, there will be 2 J’s left 
on the tape. The state diagram of the resulting Turing machine 
is depicted in Figure 14.2. 


14.3 Turing Machines 


The formal definition of what constitutes a Turing machine looks 
abstract, but is actually simple: it merely packs into one mathe- 
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matical structure all the information needed to specify the work- 
ings of a Turing machine. This includes (1) which states the 
machine can be in, (2) which symbols are allowed to be on the 
tape, (3) which state the machine should start in, and (4) what 
the instruction set of the machine is. 


Definition 14.5 (Turing machine). A Turing machine M is a tu- 
ple (Q,2',q0,6) consisting of 


1. a finite set of states Q, 
2. a finite alphabet X which includes > and LU, 
3. an initial state qo € Q, 


4. a finite instruction set d:Qx XY + Qx2 x {L,R,N}. 


The partial function 6 is also called the transition function of M. 


We assume that the tape is infinite in one direction only. For 
this reason it is useful to designate a special symbol > as a marker 
for the left end of the tape. This makes it easier for Turing ma- 
chine programs to tell when they’re “in danger” of running off 
the tape. We could assume that this symbol is never overwritten, 
ie., that 6(q,>) = (q’,>,x) if 6(g,>) is defined. Some textbooks 
do this, we do not. You can simply be careful when construct- 
ing your Turing machine that it never overwrites >. Moreover, 
there are cases where allowing such overwriting provides some 
convenient flexibility. 


Example 14.6. Even Machine: The even machine is formally the 
quadruple (Q,2,q9,6) where 


Q = {90,.m} 

+ = {>,U,/}, 
6(go,1) = (91,1,R), 
6(q1,1) = (go.t,R), 
6(q1,U) = (q1,U,R). 
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14.4 Configurations and Computations 


Recall tracing through the configurations of the even machine 
earlier. The imaginary mechanism consisting of tape, read/write 
head, and Turing machine program is really just an intuitive way 
of visualizing what a Turing machine computation is. Formally, 
we can define the computation of a Turing machine on a given 
input as a sequence of configurations—and a configuration in turn 
is a sequence of symbols (corresponding to the contents of the 
tape at a given point in the computation), a number indicating 
the position of the read/write head, and a state. Using these, 
we can define what the Turing machine M computes on a given 
input. 


Definition 14.7 (Configuration). A configuration of Turing ma- 
chine M = (Q,2,q0,6) is a triple (C,m,q) where 


1. C € S" is a finite sequence of symbols from 2, 


2. m € N is a number < len(C), and 


3.9 EQ 


Intuitively, the sequence C’ is the content of the tape (symbols 
of all squares from the leftmost square to the last non-blank or 
previously visited square), m is the number of the square the 
read/write head is scanning (beginning with 0 being the number 
of the leftmost square), and q is the current state of the machine. 


The potential input for a Turing machine is a sequence of 
symbols, usually a sequence that encodes a number in some form. 
The initial configuration of the Turing machine is that configura- 
tion in which we start the Turing machine to work on that input: 
the tape contains the tape end marker immediately followed by 
the input written on the squares to the right, the read/write head 
is scanning the leftmost square of the input (i.e., the square to 
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the right of the left end marker), and the mechanism is in the 
designated start state qo. 


Definition 14.8 (Initial configuration). The initial configura- 
tion of M for input J € X” is 


(> _- I,1,q0). 


The — symbol is for concatenation—the input string begins 
immediately to the left end marker. 


Definition 14.9. We say that a configuration (C,m,q) yields the 
configuration (C’,m’,q’) in one step (according to M), iff 


1. the m-th symbol of C is o, 
. the instruction set of M specifies 6(q¢,7) = (q’,0’,D), 
. the m-th symbol of C’ is o’, and 


a) D=Land m’ = m-1 if m > 0, otherwise m’ = 0, or 
b) D=Rand m’ = m+1, or 
c) D=N and m’ =m, 


. if m’ = len(C), then len(C’) = len(C) + 1 and the m’-th 
symbol of C’ is Ll. Otherwise len(C’) = len(C). 


. for all i such that i < len(C) and i # m, C’(i) = C(i), 


Definition 14.10. A run of M on input I is a sequence C; of con- 
figurations of M, where Cp is the initial configuration of M for 
input J, and each C; yields C;,1 in one step. 

We say that M halts on input I after k steps if Cy, = (C,m,q), 
the mth symbol of C' is 0, and 6(q,) is undefined. In that case, 
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the output of M for input J is O, where O is a string of symbols 


not ending in LI such that C =>» ~ O — L for some i, j € N. 


According to this definition, the output O of M always ends 
in a symbol other than L, or, if at time & the entire tape is filled 
with L! (except for the leftmost >), O is the empty string. 


14.5 Unary Representation of Numbers 


Turing machines work on sequences of symbols written on their 
tape. Depending on the alphabet a Turing machine uses, these 
sequences of symbols can represent various inputs and outputs. 
Of particular interest, of course, are Turing machines which com- 
pute arithmetical functions, i.e., functions of natural numbers. A 
simple way to represent positive integers is by coding them as 
sequences of a single symbol J. If n € N, let J” be the empty se- 
quence if n = 0, and otherwise the sequence consisting of exactly 
nI’s. 


Definition 14.11 (Computation). A Turing machine M com- 
putes the function f: N* — N iff M halts on input 


MmuIm™u...u7" 


with output [/(™---™8), 


Example 14.12. Addition: Let’s build a machine that computes 
the function f(n,m) = n+m. This requires a machine that starts 
with two blocks of /’s of length n and m on the tape, and halts 
with one block consisting of n+m J’s. The two input blocks of J’s 
are separated by a LI, so one method would be to write a stroke 
on the square containing the U, and erase the last J. 


In Example 14.4, we gave an example of a Turing machine 
that takes as input a sequence of J’s and halts with a sequence of 
twice as many /’s on the tape—the doubler machine. However, 
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I,I,R I,I,R I,uU,N 
U,-,N UU, 2 
start —>( 70 > > 


Figure 14.3: A machine computing f(x,y) = x+y 


because the output contains Ls to the left of the doubled block 
of I’s, it does not actually compute the function f(x) = 2x, as 
you might have assumed. We’ll describe two ways of fixing that. 


Example 14.13. The machine in Figure 14.4 computes the func- 
tion f(x) = 2x. Instead of erasing the input and writing two I’s 
at the far right for every J in the input as the machine from Ex- 
ample 14.4 does, this machine adds a single J to the right for 
every J in the input. It has to keep track of where the input ends, 
so it leaves a L| between the input and the added strokes, which it 
fills with a J at the very end. And we have to “remember” where 
we are in the input, so we temporarily replace a J in the input 


block by a U. 


Example 14.14. A second possibility for computing f(x) = 2x 
is to keep the original doubler machine, but add states and in- 
structions at the end which move the doubled block of strokes to 
the far left of the tape. The machine in Figure 14.5 does just this 
last part: started on a tape consisting of a block of L’s followed 
by a block of /’s (and the head positioned anywhere in the block 
of Ls), it erases the /’s one at a time and writes them at the be- 
ginning of the tape. In order to be able to tell when it is done, it 
first marks the end of the block of J’s with a > symbol, which gets 
deleted at the end. We’ve started numbering the states at q¢, so 
they can be added to the doubler machine. All you’ll need is an 
additional instruction 6(95,LI) = (g6,U, NV), i-e., an arrow from q5 
to gg labelled L,LI,N. (There is one subtle problem: the resulting 


CHAPTER 14. TURING MACHINE COMPUTATIONS 285 


Figure 14.4: A machine computing f(x) = 2x 


machine does not work for input x = 0. We'll leave this as an 
exercise.) 


Definition 14.15. A Turing machine M computes the partial 
function f : N* + N iff, 


1. M halts on input J" —~U-... ~U — J” with output 
LP iE P (Rijgsnne te) = me 


2. M does not halt at all, or with an output that is not a single 
block of J’s if f(m1,...,,) is undefined. 
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U,U,R I,I,R 
U,U,R I,I,R 
start —|( 6 >( 97 > 
UU,L Le, L 
I,U,L 
I,I,L 
U,U,R 

So 

p>, R 

I,I,R 


U,U,R 


Figure 14.5: Moving a block of I’s to the left 


14.6 Halting States 


Although we have defined our machines to halt only when there 
is no instruction to carry out, common representations of Turing 
machines have a dedicated halting state h, such that h € Q. 

The idea behind a halting state is simple: when the machine 
has finished operation (it is ready to accept input, or has finished 
writing the output), it goes into a state 4 where it halts. Some 
machines have two halting states, one that accepts input and one 
that rejects input. 


Example 14.16. Halting States. To elucidate this concept, let us 
begin with an alteration of the even machine. Instead of having 
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the machine halt in state go if the input is even, we can add an 
instruction to send the machine into a halting state. 


U,U,R 
I,I,R 
start —(#) (a) 
I,I,R 
U,L, NV 


Let us further expand the example. When the machine de- 
termines that the input is odd, it never halts. We can alter the 
machine to include a reject state by replacing the looping instruc- 
tion with an instruction to go to a reject state r. 


I,I,R 
start —(#) (a) 
I,I,R 
L,U,.V U,U,N 


Adding a dedicated halting state can be advantageous in 
cases like this, where it makes explicit when the machine ac- 
cepts/rejects certain inputs. However, it is important to note 
that no computing power is gained by adding a dedicated halting 
state. Similarly, a less formal notion of halting has its own advan- 
tages. The definition of halting used so far in this chapter makes 
the proof of the Halting Problem intuitive and easy to demonstrate. 
For this reason, we continue with our original definition. 
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14.7. Disciplined Machines 


In section section 14.6, we considered Turing machines that have 
a single, designated halting state A—such machines are guaran- 
teed to halt, if they halt at all, in state 4. In this way, machines 
with a single halting state are more “disciplined” than we allow 
Turing machines in general to be. There are other restrictions we 
might impose on the behavior of Turing machines. For instance, 
we also have not prohibited Turing machines from ever erasing 
the tape-end marker on square 0, or to attempt to move left from 
square 0. (Our definition states that the head simply stays on 
square 0 in this case; other definitions have the machine halt.) It 
is likewise sometimes desirable to be able to assume that a Turing 
machine, if it halts at all, halts on square 1. 


Definition 14.17. A Turing machine M is disciplined iff 
1. it has a designated single halting state h, 
2. it halts, if it halts at all, while scanning square 1, 


3. it never erases the > symbol on square 0, and 


4. it never attempts to move left from square 0. 


We have already discussed that any Turing machine can be 
changed into one with the same behavior but with a designated 
halting state. This is done simply by adding a new state h, and 
adding an instruction 6(q¢,o7) = (A,o,N) for any pair (¢,07) where 
the original 6 is undefined. It is true, although tedious to prove, 
that any Turing machine M can be turned into a disciplined Tur 
ing machine M’ which halts on the same inputs and produces 
the same output. For instance, if the Turing machine halts and 
is not on square 1, we can add some instructions to make the 
head move left until it finds the tape-end marker, then move one 
square to the right, then halt. We’ll leave you to think about how 
the other conditions can be dealt with. 
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L,I,N 
start —(#) (a) 


LI,R I,I,R 


OO) 


Figure 14.6: A disciplined addition machine 


Example 14.18. In Figure 14.6, we turn the addition machine 
from Example 14.12 into a disciplined machine. 


14.8 Combining Turing Machines 


The examples of Turing machines we have seen so far have been 
fairly simple in nature. But in fact, any problem that can be solved 
with any modern programming language can also be solved with 
Turing machines. To build more complex Turing machines, it 
is important to convince ourselves that we can combine them, 
so we can build machines to solve more complex problems by 
breaking the procedure into simpler parts. If we can find a natu- 
ral way to break a complex problem down into constituent parts, 
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we can tackle the problem in several stages, creating several sim- 
ple Turing machines and combining them into one machine that 
can solve the problem. This point is especially important when 
tackling the Halting Problem in the next section. 

How do we combine Turing machines M = (Q,2,q0,6) 
and M’ = (Q’,2’,qj,0’)? We now use the configuration of the 
tape after M has halted as the input configuration of a run of 
machine M’. To get a single Turing machine M — M’ that does 
this, do the following: 


1. Renumber (or relabel) all the states Q’ of M’ so that M 
and M’ have no states in common (Q 9 Q’ = 0). 


2. The states of M — M’ are QUQ’. 
3. The tape alphabet is Y U 2”. 
4. The start state is qo. 


5. The transition function is the function 6” given by: 


6(g.0)  ifgeQ 
6"(g.0)=410'(9g,0) ifgeQ’ 
(q.0,N) if ¢ € Q and 6(q,c) is undefined 


The resulting machine uses the instructions of M when it is in a 
state g € Q, the instructions of M’ when it is in a state g € Q’. 
When it is in a state g € Q and is scanning a symbol o for which 
M has no transition (i.e., M would have halted), it enters the start 
state of M’ (and leaves the tape contents and head position as it 
is). 

Note that unless the machine M is disciplined, we don’t know 
where the tape head is when / halts, so the halting configuration 
of M need not have the head scanning square 1. When combining 
machines, it’s important to keep this in mind. 


Example 14.20. Combining Machines: We'll design a machine 
which, when started on input consisting of two blocks of I’s of 
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length n and m, halts with a single block of 2(m +n) I’s on the 
tape. In order to build this machine, we can combine two ma- 
chines we are already familiar with: the addition machine, and 
the doubler. We begin by drawing a state diagram for the addi- 
tion machine. 


I,I,R I,I,R I,uU,N 
U,l,N UU, 2 
start —>( 70 > > 


Instead of halting in state g2, we want to continue operation in or- 
der to double the output. Recall that the doubler machine erases 
the first stroke in the input and writes two strokes in a separate 
output. Let’s add an instruction to make sure the tape head is 
reading the first stroke of the output of the addition machine. 


I,I,R I,I,R 


It is now easy to double the input—all we have to do is con- 
nect the doubler machine onto state q,. This requires renaming 
the states of the doubler machine so that they start at g, instead 
of go—this way we don’t end up with two starting states. The 
final diagram should look as in Figure 14.7. 
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I,1,R I,1,R 


Om 


Z,uU,R St Cos U,U,R 7 
U,U,L wo LI,L wo 
L LIL U,J,L 


Figure 14.7: Combining adder and doubler machines 


U,U,R 


Proof: Since M is disciplined, when it halts with out- 
put f(m,...,m,) = m, the head is scanning square 1. If we 
now enter the start state of M’, the machine will halt with out- 
put f’(m), again scanning square 1. The other conditions of 
Definition 14.17 are also satisfied. Oo 
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14.9 Variants of Turing Machines 


There are in fact many possible ways to define Turing machines, 
of which ours is only one. In some ways, our definition is more 
liberal than others. We allow arbitrary finite alphabets, a more 
restricted definition might allow only two tape symbols, J and L. 
We allow the machine to write a symbol to the tape and move at 
the same time, other definitions allow either writing or moving. 
We allow the possibility of writing without moving the tape head, 
other definitions leave out the N “instruction.” In other ways, 
our definition is more restrictive. We assumed that the tape is 
infinite in one direction only, other definitions allow the tape to 
be infinite both to the left and the right. In fact, one can even 
allow any number of separate tapes, or even an infinite grid of 
squares. We represent the instruction set of the Turing machine 
by a transition function; other definitions use a transition relation 
where the machine has more than one possible instruction in any 
given situation. 

This last relaxation of the definition is particularly interest- 
ing. In our definition, when the machine is in state g reading 
symbol o, 6(g,a) determines what the new symbol, state, and 
tape head position is. But if we allow the instruction set to be a 
relation between current state-symbol pairs (g,o-) and new state- 
symbol-direction triples (q’,a’,D), the action of the Turing ma- 
chine may not be uniquely determined—the instruction relation 
may contain both (g¢,0,q’,0’,D) and (q,0,q”,0”"",D’). In this 
case we have a non-deterministic Turing machine. These play an 
important role in computational complexity theory. 

There are also different conventions for when a Turing ma- 
chine halts: we say it halts when the transition function is un- 
defined, other definitions require the machine to be in a special 
designated halting state. We have explained in section 14.6 why 
requiring a designated halting state is not a restriction which im- 
pacts what Turing machines can compute. Since the tapes of our 
Turing machines are infinite in one direction only, there are cases 
where a Turing machine can’t properly carry out an instruction: 
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if it reads the leftmost square and is supposed to move left. Ac- 
cording to our definition, it just stays put instead of “falling off”, 
but we could have defined it so that it halts when that happens. 
This definition is also equivalent: we could simulate the behavior 
of a Turing machine that halts when it attempts to move left from 
square 0 by deleting every transition 6(qg,>) = (q’,o0,L)—then 
instead of attempting to move left on > the machine halts.* 

There are also different ways of representing numbers (and 
hence the input-output function computed by a Turing machine): 
we use unary representation, but you can also use binary repre- 
sentation. This requires two symbols in addition to LU and >. 

Now here is an interesting fact: none of these variations mat- 
ters as to which functions are Turing computable. [fa function is 
Turing computable according to one definition, it is Turing computable 
according to all of them. 

We won’t go into the details of verifying this. Here’s just one 
example: we gain no additional computing power by allowing a 
tape that is infinite in both directions, or multiple tapes. The 
reason is, roughly, that a Turing machine with a single one-way 
infinite tape can simulate multiple or two-way infinite tapes. E.g., 
using additional states and instructions, we can “translate” a pro- 
gram for a machine with multiple tapes or two-way infinite tape 
into one with a single one-way infinite tape. The translated ma- 
chine can use the even squares for the squares of tape 1 (or the 
“positive” squares of a two-way infinite tape) and the odd squares 
for the squares of tape 2 (or the “negative” squares). 


14.10 The Church-Turing Thesis 


Turing machines are supposed to be a precise replacement for 
the concept of an effective procedure. Turing thought that any- 
one who grasped both the concept of an effective procedure and 


+This doesn’t quite work, since nothing prevents us from writing and read- 
ing > on squares other than square 0 (see Example 14.14). We can get around 
that by adding a second >’ symbol to use instead for such a purpose. 
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the concept of a Turing machine would have the intuition that 
anything that could be done via an effective procedure could be 
done by Turing machine. This claim is given support by the fact 
that all the other proposed precise replacements for the concept 
of an effective procedure turn out to be extensionally equivalent 
to the concept of a Turing machine —that is, they can compute 
exactly the same set of functions. This claim is called the Church- 
Turing thesis. 


Definition 14.22 (Church-Turing thesis). The Church-Turing 


Thesis states that anything computable via an effective procedure 
is Turing computable. 


The Church-Iuring thesis is appealed to in two ways. The first 
kind of use of the Church-Turing thesis is an excuse for laziness. 
Suppose we have a description of an effective procedure to com- 
pute something, say, in “pseudo-code.” Then we can invoke the 
Church-luring thesis to justify the claim that the same function 
is computed by some Turing machine, even if we have not in fact 
constructed it. 

The other use of the Church-Turing thesis is more philosoph- 
ically interesting. It can be shown that there are functions which 
cannot be computed by Turing machines. From this, using the 
Church-luring thesis, one can conclude that it cannot be effec- 
tively computed, using any procedure whatsoever. For if there 
were such a procedure, by the Church-Turing thesis, it would fol- 
low that there would be a Turing machine for it. So if we can 
prove that there is no Turing machine that computes it, there also 
can’t be an effective procedure. In particular, the Church-Turing 
thesis is invoked to claim that the so-called halting problem not 
only cannot be solved by Turing machines, it cannot be effectively 
solved at all. 
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Summary 


A Turing machine is a kind of idealized computation mecha- 
nism. It consists of a one-way infinite tape, divided into squares, 
each of which can contain a symbol from a pre-determined al- 
phabet. The machine operates by moving a read-write head 
along the tape. It may also be in one of a pre-determined num- 
ber of states. The actions of the read-write head are determined 
by a set of instructions; each instruction is conditional on the ma- 
chine being in a certain state and reading a certain symbol, and 
specifies which symbol the machine will write onto the current 
square, whether it will move the read-write head one square left, 
right, or stay put, and which state it will switch to. If the tape 
contains a certain input, represented as a sequence of symbols 
on the tape, and the machine is put into the designated start state 
with the read-write head reading the leftmost square of the input, 
the instruction set will step-wise determine a sequence of config- 
urations of the machine: content of tape, position of read-write 
head, and state of the machine. Should the machine encounter 
a configuration in which the instruction set does not contain an 
instruction for the current symbol read/state combination, the 
machine halts, and the content of the tape is the output. 

Numbers can very easily be represented as sequences of 
strokes on the Tape of a Turing machine. We say a function 
N — N is Turing computable if there is a Turing machine 
which, whenever it is started on the unary representation of n 
as input, eventually halts with its tape containing the unary rep- 
resentation of f(n) as output. Many familiar arithmetical func- 
tions are easily (or not-so-easily) shown to be Turing computable. 
Many other models of computation other than Turing machines 
have been proposed; and it has always turned out that the arith- 
metical functions computable there are also Turing computable. 
This is seen as support for the Church-Turing Thesis, that every 
arithmetical function that can effectively be computed is Turing 
computable. 


CHAPTER 14. TURING MACHINE COMPUTATIONS 297 


Problems 


Problem 14.1. Choose an arbitary input and trace through the 
configurations of the doubler machine in Example 14.4. 


Problem 14.2. Design a ‘Turing-machine with alphabet 
{>,L,A,B} that accepts, ie., halts on, any string of A’s and 
B’s where the number of A’s is the same as the number of B’s and 
all the A’s precede all the B’s, and rejects, i.e., does not halt on, 
any string where the number of A’s is not equal to the number 
of B’s or the A’s do not precede all the B’s. (E.g., the machine 
should accept AABB, and AAABBB, but reject both AAB and 
AABBAABB.) 


Problem 14.3. Design a  ‘Turing-machine with alphabet 
{>,U,A,B} that takes as input any string a of A’s and B’s and 
duplicates them to produce an output of the form aq. (E.g. input 
ABBA should result in output ABBAABBA). 


Problem 14.4. Alphabetical?: Design a Turing-machine with al- 
phabet {>,LI, A,B} that when given as input a finite sequence of 
A’s and B’s checks to see if all the A’s appear to the left of all 
the B’s or not. The machine should leave the input string on the 
tape, and either halt if the string is “alphabetical”, or loop forever 
if the string is not. 


Problem 14.5. Alphabetizer: Design a Turing-machine with al- 
phabet {>,LI,A,B} that takes as input a finite sequence of A’s 
and B’s rearranges them so that all the A’s are to the left of 
all the B’s. (e.g., the sequence BABAA should become the se- 
quence AAABB, and the sequence ABBABB should become the 
sequence AABBBB). 


Problem 14.6. Give a definition for when a Turing machine M 
computes the function f: N‘ > N™. 
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Problem 14.7. Trace through the configurations of the machine 
from Example 14.12 for input (3,2). What happens if the machine 
computes 0 + 0? 


Problem 14.8. In Example 14.14 we described a machine con- 
sisting of a combination of the doubler machine from Figure 14.4 
and the mover machine from Figure 14.5. What happens if you 
start this combined machine on input x = 0, i.e., on an empty 
tape? How would you fix the machine so that in this case the 
machine halts with output 2x = 0? (You should be able to do this 
by adding one state and one transition.) 


Problem 14.9. Subtraction: Design a Turing machine that when 
given an input of two non-empty strings of strokes of length n 
and m, where n > m, computes the function f(n,m) = n— m. 


Problem 14.10. Equality: Design a Turing machine to compute 
the following function: 


; 1 ifn=m 
equality(n,m) = one 


where n and m € Z". 


Problem 14.11. Design a Turing machine to compute the func- 
tion min(x,y) where x and y are positive integers represented on 
the tape by strings of J’s separated by a LU. You may use addi- 
tional symbols in the alphabet of the machine. 

The function min selects the smallest value from its argu- 
ments, so min(3,5) = 3, min(20,16) = 16, and min(4,4) = 4, and 
so on. 


Problem 14.12. Give a disciplined machine that computes 


f(x) =x41. 


Problem 14.13. Find a disciplined machine which, when started 
on input J” produces output J” ~U— J”. 
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Problem 14.14. Give a disciplined Turing machine computing 
f(x) = «+2 by taking the machine M from problem 14.12 and 
construct M —~ M. 


CHAPTER 15 


Undecidability 


15.1 Introduction 


It might seem obvious that not every function, even every arith- 
metical function, can be computable. There are just too many, 
whose behavior is too complicated. Functions defined from the 
decay of radioactive particles, for instance, or other chaotic or 
random behavior. Suppose we start counting 1-second intervals 
from a given time, and define the function f(m) as the number 
of particles in the universe that decay in the n-th 1-second inter- 
val after that initial moment. This seems like a candidate for a 
function we cannot ever hope to compute. 

But it is one thing to not be able to imagine how one would 
compute such functions, and quite another to actually prove that 
they are uncomputable. In fact, even functions that seem hope- 
lessly complicated may, in an abstract sense, be computable. For 
instance, suppose the universe is finite in time—some day, in the 
very distant future the universe will contract into a single point, 
as some cosmological theories predict. Then there is only a fi- 
nite (but incredibly large) number of seconds from that initial 
moment for which f(m) is defined. And any function which is 
defined for only finitely many inputs is computable: we could list 
the outputs in one big table, or code it in one very big Turing 
machine state transition diagram. 


300 
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We are often interested in special cases of functions whose 
values give the answers to yes/no questions. For instance, the 
question “is n a prime number?” is associated with the function 


a ( if n is prime 
isprime(n) = : 
0 otherwise. 
We say that a yes/no question can be effectively decided, if the as- 
sociated 1/0-valued function is effectively computable. 

To prove mathematically that there are functions which can- 
not be effectively computed, or problems that cannot effectively 
decided, it is essential to fix a specific model of computation, and 
show that there are functions it cannot compute or problems it 
cannot decide. We can show, for instance, that not every func- 
tion can be computed by Turing machines, and not every problem 
can be decided by ‘Turing machines. We can then appeal to the 
Church-luring thesis to conclude that not only are Turing ma- 
chines not powerful enough to compute every function, but no 
effective procedure can. 

The key to proving such negative results is the fact that we 
can assign numbers to Turing machines themselves. The easiest 
way to do this is to enumerate them, perhaps by fixing a specific 
way to write down Turing machines and their programs, and then 
listing them in a systematic fashion. Once we see that this can 
be done, then the existence of Turing-uncomputable functions 
follows by simple cardinality considerations: the set of functions 
from N to N (in fact, even just from N to {0,1}) are uncountable, 
but since we can enumerate all the Turing machines, the set of 
Turing-computable functions is only countably infinite. 

We can also define specific functions and problems which we 
can prove to be uncomputable and undecidable, respectively. 
One such problem is the so-called Halting Problem. Turing ma- 
chines can be finitely described by listing their instructions. Such 
a description of a Turing machine, i.e., a Turing machine pro- 
gram, can of course be used as input to another Turing machine. 
So we can consider Turing machines that decide questions about 
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other Turing machines. One particularly interesting question is 
this: “Does the given Turing machine eventually halt when started 
on input n?” It would be nice if there were a Turing machine that 
could decide this question: think of it as a quality-control Turing 
machine which ensures that Turing machines don’t get caught 
in infinite loops and such. The interesting fact, which Turing 
proved, is that there cannot be such a Turing machine. There 
cannot be a single Turing machine which, when started on in- 
put consisting of a description of a Turing machine M and some 
number 2, will always halt with either output 1 or 0 according to 
whether M machine would have halted when started on input n 
or not. 

Once we have examples of specific undecidable problems we 
can use them to show that other problems are undecidable, too. 
For instance, one celebrated undecidable problem is the question, 
“Is the first-order formula A valid?”. There is no Turing machine 
which, given as input a first-order formula A, is guaranteed to halt 
with output 1 or 0 according to whether A is valid or not. His- 
torically, the question of finding a procedure to effectively solve 
this problem was called simply “the” decision problem; and so we 
say that the decision problem is unsolvable. Turing and Church 
proved this result independently at around the same time, so it 
is also called the Church-Turing Theorem. 


15.2 Enumerating Turing Machines 


We can show that the set of all Turing machines is countable. This 
follows from the fact that each Turing machine can be finitely 
described. The set of states and the tape vocabulary are finite 
sets. The transition function is a partial function from Q x 2 to 
Qx2 x {L,R,N}, and so likewise can be specified by listing its 
values for the finitely many argument pairs for which it is defined. 

This is true as far as it goes, but there is a subtle difference. 
The definition of Turing machines made no resriction on what 
elements the set of states and tape alphabet can have. So, e.g., 
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U,U,R 
I,I,R 

start (wy {#) 
I,I,R 


U,U,R 
A,A,R 
start oa 
A,A,R 


Figure 15.1: Variants of the Even machine 


for every real number, there technically is a Turing machine that 
uses that number as a state. However, the behavior of the Tur 
ing machine is independent of which objects serve as states and 
vocabulary. Consider the two Turing machines in Figure 15.1. 
These two diagrams correspond to two machines, M with the 
tape alphabet X = {>,LI,/} and set of states {go,q1}, and M’ with 
alphabet X’ = {>,LI,A} and states {s,A}. But their instructions 
are otherwise the same: M will halt on a sequence of n J’s iff n 
is even, and M’ will halt on a sequence of n A’s iff n is even. All 
we've done is rename J to A, go to s, and q; to A. This example 
generalizes: we can think of Turing machines as the same as long 
as one results from the other by such a renaming of symbols and 
states. In fact, we can simply think of the symbols and states of a 
Turing machine as positive integers: instead of oo think 1, instead 
of o think 2, etc.; > is 1, LI is 2, etc. In this way, the Even machine 
becomes the machine depicted in Figure 15.2. We might call a 
Turing machine with states and symbols that are positive integers 
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2,2,R 
3,3,R 


start -Gy 4 


3,3,R 


Figure 715.2: A standard Even machine 


a standard machine, and only consider standard machines from 
now on.* 

We wanted to show that the set of Turing machines is count- 
able, and with the above considerations in mind, it is enough to 
show that the set of standard Turing machines is countable. Sup- 
pose we are given a standard Turing machine M = (Q,2,q0,6). 
How could we describe it using a finite string of positive inte- 
gers? We'll first list the number of states, the states themselves, 
the number of symbols, the symbols themselves, and the start- 
ing state. (Remember, all of these are positive integers, since 
M is a standard machine.) What about 6? The set of possible 
arguments, i.e., pairs (g,o), is finite, since Q and » are finite. 
So the information in 6 is simply the finite list of all 5-tuples 
(q,0,9q',0’,d) where 6(q,o7) = (q’,0’,D), and d is a number that 
codes the direction D (say, 1 for L, 2 for R, and 3 for NV). 

In this way, every standard Turing machine can be described 
by a finite list of positive integers, i.e., as a sequence sy € (Z*)*. 
For instance, the standard Even machine is coded by the sequence 


z 6(2,2)=(2,2,R) 
— — 
2, 1,2 ,3,1,2,3,1, 1,3,2,3,2 , 2,2,2,2,2 , 2,3,1,3,2 . 
— —— 


Se 
Q 6(1,3)=(2,3,R) 6(2,3)=(1,3,R) 


+The terminology “standard machine” is not standard. 
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Proof: We know that the set of finite sequences of positive inte- 
gers (Z*)* is countable (problem 4.7). This gives us that the set 
of descriptions of standard Turing machines, as a subset of (Z*)*, 
is itself enumerable. Every Turing computable function N to N is 
computed by some (in fact, many) Turing machines. By renam- 
ing its states and symbols to positive integers (in particular, > 
as 1, Las 2, and J as 3) we can see that every Turing computable 
function is computed by a standard Turing machine. This means 
that the set of all Turing computable functions from N to N is 
also enumerable. 

On the other hand, the set of all functions from N to N is 
not countable (problem 4.21). If all functions were computable 
by some Turing machine, we could enumerate the set of all func- 
tions by listing all the descriptions of Turing machines that com- 
pute them. So there are some functions that are not Turing com- 
putable. oO 


15.3 Universal Turing Machines 


In section 15.2 we discussed how every Turing machine can be de- 
scribed by a finite sequence of integers. This sequence encodes 
the states, alphabet, start state, and instructions of the Turing 
machine. We also pointed out that the set of all of these descrip- 
tions is countable. Since the set of such descriptions is countably 
infinite, this means that there is a surjective function from N to 
these descriptions. Such a surjective function can be obtained, 
for instance, using Cantor’s zig-zag method. It gives us a way of 
enumerating all (descriptions) of Turing machines. If we fix one 
such enumeration, it now makes sense to talk of the Ist, 2nd, ..., 
eth Turing machine. These numbers are called indices. 
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Definition 15.2. If M is the eth Turing machine (in our fixed 
enumeration), we say that e is an index of M. We write M, for 


the eth Turing machine. 


A machine may have more than one index, e.g., two descrip- 
tions of M may differ in the order in which we list its instructions, 
and these different descriptions will have different indices. 

Importantly, it is possible to give the enumeration of Tur 
ing machine descriptions in such a way that we can effectively 
compute the description of M from its index, and to effectively 
compute an index of a machine M from its description. By the 
Church-Turing thesis, it is then possible to find a Turing machine 
which recovers the description of the Turing machine with index e 
and writes the corresponding description on its tape as output. 
The description would be a sequence of blocks of J’s (represent- 
ing the positive integers in the sequence describing M,). 

Given this, it now becomes natural to ask: what functions 
of Turing machine indices are themselves computable by Turing 
machines? What properties of Turing machine indices can be de- 
cided by Turing machines? An example: the function that maps 
an index e to the number of states the Turing machine with in- 
dex ¢ has, is computable by a Turing machine. Here’s what such 
a Turing machine would do: started on a tape containing a sin- 
gle block of e J’s, it would first decode ¢ into its description. The 
description is now represented by a sequence of blocks of J’s on 
the tape. Since the first element in this sequence is the number 
of states. So all that has to be done now is to erase everything 
but the first block of J’s and then halt. 

A remarkable result is the following: 
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Proof. To actually produce U is basically impossible, since it is an 
extremely complicated machine. But we can describe in outline 
how it works, and then invoke the Church-Turing thesis. When it 
starts, U’s tape contains a block of e J’s followed by a block of 
n I’s. It first “decodes” the index e to the right of the input n. This 
produces a list of numbers (i.e., blocks of /’s separated by Ls) 
that describes the instructions of machine M,. U then writes the 
number of the start state of M, and the number 1 on the tape to 
the right of the description of M,. (Again, these are represented 
in unary, as blocks of J’s.) Next, it copies the input (block of 
n I’s) to the right—but it replaces each J by a block of three J’s 
(remember, the number of the J symbol is 3, 1 being the number 
of > and 2 being the number of LI). At the left end of this sequence 
of blocks (separated by L! symbols on the tape of U), it writes a 
single J, the code for >. 

U now has on its tape: the index e, the number n, the code 
number of the start state (the “current state”), the number of 
the initial head position 1 (the “current head position”), and the 
initial contents of the “tape” (a sequence of blocks of /’s repre- 
senting the code numbers of the symbols of /,—the “symbols”— 
separated by L’s). 

It now simulates what M, would do if started on input n, by 
doing the following: 


1. Find the number & of the “current head position” (at the 
beginning, that’s 1), 


2. Move to the kth block in the “tape” to see what the “sym- 
bol” there is, 


3. Find the instruction matching the current “state” and “sym- 
bol,” 
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4. Move back to the éth block on the “tape” and replace the 
“symbol” there with the code number of the symbol M, 
would write, 


5. Move the head to where it records the current “state” and 
replace the number there with the number of the new state, 


6. Move to the place where it records the “tape position” and 
erase a J or add a J (if the instruction says to move left or 
right, respectively). 


7. Repeat.* 


If M, started on input 7 never halts, then U also never halts, so 
its output is undefined. 

If in step (3) it turns out that the description of M, contains no 
instruction for the current “state”/“symbol” pair, then M, would 
halt. If this happens, U erases the part of its tape to the left of 
the “tape.” For each block of three /’s (representing a J on M,’s 
tape), it writes a J on the left end of its own tape, and successively 
erases the “tape.” When this is done, U’s tape contains a single 
block of J’s of length m. 

If U encounters something other than a block of three J’s 
on the “tape,” it immediately halts. Since U’s tape in this case 
does not contain a single block of J’s, its output is not a natural 
number, ie., f(e,m) is undefined in this case. Oo 


15.4 The Halting Problem 


Assume we have fixed some enumeration of Turing machine de- 
scriptions. Each Turing machine thus receives an index: its place 
in the enumeration M@,, Mj, M3, ... of Turing machine descrip- 
tions. 


?We're glossing over some subtle difficulties here. E.g., UV may need some 
extra space when it increases the counter where it keeps track of the “current 
head position”—in that case it will have to move the entire “tape” to the right. 
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We know that there must be non-Iuring-computable func- 
tions: the set of Turing machine descriptions—and hence the 
set of Turing machines—is countable, but the set of all functions 
from N to N is not. But we can find specific examples of non- 
computable functions as well. One such function is the halting 
function. 


Definition 15.4 (Halting function). The halting function h is 
defined as 


o if machine M, does not halt for input n 
h(e,n) = 


1 if machine &, halts for input n 


Definition 15.5 (Halting problem). The Halting Problem is the 
problem of determining (for any ¢, m) whether the Turing ma- 
chine M, halts for an input of n strokes. 


We show that / is not Turing-computable by showing that a 
related function, s, is not Turing-computable. This proof relies on 
the fact that anything that can be computed by a Turing machine 
can be computed by a disciplined Turing machine (section 14.7), 
and the fact that two Turing machines can be hooked together to 
create a single machine (section 14.8). 


Definition 15.6. The function s is defined as 


de ‘ if machine M, does not halt for input e¢ 
s(@) = 


1 if machine &, halts for input e 


Proof. We suppose, for contradiction, that the function s is Tur- 
ing computable. Then there would be a Turing machine S' that 
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computes s. We may assume, without loss of generality, that when 
S halts, it does so while scanning the first square (i.e., that it is 
disciplined). This machine can be “hooked up” to another ma- 
chine J, which halts if it is started on input 0 (i.e., if it reads LU in 
the initial state while scanning the square to the right of the end- 
of-tape symbol), and otherwise wanders off to the right, never 
halting. S — J, the machine created by hooking S' to /, is a Tur 
ing machine, so it is M, for some e (i.e., it appears somewhere in 
the enumeration). Start M@, on an input of e Js. There are two 
possibilities: either M, halts or it does not halt. 


1. Suppose &, halts for an input of e 7s. Then s(e) = 1. So 
S, when started on e, halts with a single J as output on the 
tape. Then / starts with a J on the tape. In that case J 
does not halt. But M, is the machine S — /, so it should 
do exactly what S followed by / would do (i.e., in this case, 
wander off to the right and never halt). So M@, cannot halt 
for an input of e J’s. 


2. Now suppose &M, does not halt for an input of e 7s. Then 
s(e) = 0, and S,, when started on input e, halts with a blank 
tape. J, when started on a blank tape, immediately halts. 
Again, M, does what S followed by J would do, so M, must 
halt for an input of e /’s. 


In each case we arrive at a contradiction with our assumption. 
This shows there cannot be a Turing machine S: s is not Turing 
computable. Oo 


Proof. Suppose A were Turing computable, say, by a Turing ma- 
chine H. We could use H to build a Turing machine that com- 
putes s: First, make a copy of the input (separated by a L| symbol). 
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Then move back to the beginning, and run H. We can clearly 
make a machine that does the former (see problem 14.13), and 
if H existed, we would be able to “hook it up” to such a copier 
machine to get a new machine which would determine if M, halts 
on input ¢, ie., computes s. But we’ve already shown that no such 
machine can exist. Hence, / is also not Turing computable. oO 


15.5 The Decision Problem 


We say that first-order logic is decidable iff there is an effective 
method for determining whether or not a given sentence is valid. 
As it turns out, there is no such method: the problem of deciding 
validity of first-order sentences is unsolvable. 

In order to establish this important negative result, we prove 
that the decision problem cannot be solved by a Turing machine. 
That is, we show that there is no Turing machine which, when- 
ever it is started on a tape that contains a first-order sentence, 
eventually halts and outputs either 1 or 0 depending on whether 
the sentence is valid or not. By the Church-Iuring thesis, every 
function which is computable is Turing computable. So if this 
“validity function” were effectively computable at all, it would be 
Turing computable. If it isn’t Turing computable, then, it also 
cannot be effectively computable. 

Our strategy for proving that the decision problem is unsolv- 
able is to reduce the halting problem to it. This means the follow- 
ing: We have proved that the function A(e,w) that halts with out- 
put 1 if the Turing machine described by ¢ halts on input w and 
outputs 0 otherwise, is not Turing computable. We will show that 
if there were a Turing machine that decides validity of first-order 
sentences, then there is also Turing machine that computes Ah. 
Since 4 cannot be computed by a Turing machine, there cannot 
be a Turing machine that decides validity either. 

The first step in this strategy is to show that for every input w 
and a Turing machine M, we can effectively describe a sentence 
T (M,w) representing the instruction set of M@ and the input w 
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and a sentence E(M,w) expressing “M eventually halts” such 
that: 


- T(M,w) — E(M,w) iff M halts for input w. 


The bulk of our proof will consist in describing these sentences 
T(M,w) and E(M,w) and in verifying that 7(M,w) — E(M,w) 
is valid iff M halts on input w. 


15.6 Representing Turing Machines 


In order to represent Turing machines and their behavior by 
a sentence of first-order logic, we have to define a suitable lan- 
guage. The language consists of two parts: predicate symbols 
for describing configurations of the machine, and expressions 
for numbering execution steps (“moments”) and positions on the 
tape. 

We introduce two kinds of predicate symbols, both of them 
2-place: For each state g, a predicate symbol Q,, and for each 
tape symbol o, a predicate symbol S,. The former allow us to 
describe the state of M and the position of its tape head, the 
latter allow us to describe the contents of the tape. 

In order to express the positions of the tape head and the 
number of steps executed, we need a way to express numbers. 
This is done using a constant symbol 0, and a 1-place function /, 
the successor function. By convention it is written after its argu- 
ment (and we leave out the parentheses). So o names the leftmost 
position on the tape as well as the time before the first execution 
step (the initial configuration), o’ names the square to the right 
of the leftmost square, and the time after the first execution step, 
and so on. We also introduce a predicate symbol < to express 
both the ordering of tape positions (when it means “to the left 
of”) and execution steps (then it means “before”). 

Once we have the language in place, we list the “axioms” of 
T (M,w), i.e., the sentences which, taken together, describe the 
behavior of M when run on input w. There will be sentences 
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which lay down conditions on 0, ”, and <, sentences that de- 
scribes the input configuration, and sentences that describe what 
the configuration of M is after it executes a particular instruc- 
tion. 


Definition 15.9. Given a Turing machine M = (Q,2,q0,6), the 
language Ly consists of: 


1. A two-place predicate symbol Q, (x,y) for every state g € Q. 
Intuitively, Q, (m,n) expresses “after n steps, M is in state g 
scanning the mth square.” 


. Atwo-place predicate symbol S,(x,y) for every symbol 7 € 


2. Intuitively, S,(m,n) expresses “after n steps, the mth 
square contains symbol o.” 


. A constant symbol o 
. A one-place function symbol / 


. A two-place predicate symbol < 


For each number n there is a canonical term n, the numeral 
for n, which represents it in Ly. 0 is 0, 1 is 0’, 2 is 0’’, and so 
on. More formally: 


The sentences describing the operation of the Turing ma- 


chine M on input w = 0;,...0;, are the following: 


1. Axioms describing numbers and <: 
a) A sentence that says that every number is less than its 


successor: 
Vinx <x’ 
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b) A sentence that ensures that < is transitive: 
Vx VyVz((x <ypAy<z)>x <2) 
2. Axioms describing the input configuration: 


a) After 0 steps—before the machine starts—WM is in the 
inital state go, scanning square 1: 


Qyo (1,0) 
b) The first & +1 squares contain the symbols >, o;,, ..., 
Oi,: _ 
S.(0,0) A Soi, (1,0) A---A Soi, (k,0) 
c) Otherwise, the tape is empty: 
Vx (k <x — Si(x,0)) 
3. Axioms describing the transition from one configuration to 
the next: 
For the following, let A(x,y) be the conjunction of all sen- 
tences of the form 
Vz (((2 <x Vx <2) A So(z,y)) 2 So(z,y’)) 


where 0 € X. We use A(m,n) to express “other than at 
square m, the tape after n + 1 steps is the same as after n 
steps.” 


a) For every instruction 6(¢;,7) = (q;,0’,R), the sen- 
tence: 


Vx Wy ((Qq,(%9) A So(*,9)) > 
(Qg, (47,9) A Sor(x, 9") A A(x.) 


This says that if, after y steps, the machine is in state 9; 
scanning square x which contains symbol o, then af- 
ter y+1 steps it is scanning square x +1, is in state q,, 
square x now contains o’, and every square other 
than x contains the same symbol as it did after y steps. 
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b) For every instruction 6(¢;,07) = (q;,0’,L), the sen- 
tence: 


Vx Vy ((Qq,(%'.9) A So(¥’.9)) > 
(Qy,(x.9') A Sor(x",y") A ACH) A 
Vy ((Qq,(0,9) A Sa (0,9) > 
(Qy,(0.9') A Ser(0,9') A A(O,9))) 


Take a moment to think about how this works: now 
we don’t start with “if scanning square x...” but: “if 
scanning square x + 1...” A move to the left means 
that in the next step the machine is scanning square x. 
But the square that is written on is x +1. We do it this 
way since we don’t have subtraction or a predecessor 
function. 


Note that numbers of the form x +1 are 1, 2,..., ie., 
this doesn’t cover the case where the machine is scan- 
ning square 0 and is supposed to move left (which of 
course it can’t—it just stays put). That special case is 
covered by the second conjunction: it says that if, af- 
ter y steps, the machine is scanning square 0 in state 
gi and square 0 contains symbol o, then after y +1 
steps it’s still scanning square 0, is now in state q;, the 
symbol on square 0 is a’, and the squares other than 
square 0 contain the same symbols they contained of- 
ter y steps. 


Cc 


~~ 


For every instruction 6(¢;,0) = (q;,0’,N), the sen- 
tence: 


Vx Wy ((Qq,(%9) A So(*,9)) > 
(Qg,(%,9') A Sor (x,9") A A(x,9))) 


Let T(M,w) be the conjunction of all the above sentences for 
Turing machine M and input w. 

In order to express that M eventually halts, we have to find 
a sentence that says “after some number of steps, the transition 


CHAPTER 15. UNDECIDABILITY 316 


function will be undefined.” Let X be the set of all pairs (¢,7) 
such that 6(qg,o) is undefined. Let E(M,w) then be the sentence 


Ax ay( \/ (Qy(x.9) A So(x,9))) 


(q.o)Ex 


If we use a Turing machine with a designated halting state h, 
it is even easier: then the sentence E(M,w) 


Ax dy Q(x.) 


expresses that the machine eventually halts. 


Proof. Exercise. Oo 


15.7. Verifying the Representation 


In order to verify that our representation works, we have to prove 
two things. First, we have to show that if M halts on input w, 
then T(M,w) — E(M,w) is valid. Then, we have to show the 
converse, i.e., that if T7(M,w) —- E(M,w) is valid, then M does 
in fact eventually halt when run on input w. 

The strategy for proving these is very different. For the first 
result, we have to show that a sentence of first-order logic (namely, 
T (M,w) — E(M,w)) is valid. The easiest way to do this is to give 
a derivation. Our proof is supposed to work for all M and w, 
though, so there isn’t really a single sentence for which we have 
to give a derivation, but infinitely many. So the best we can do 
is to prove by induction that, whatever M and w look like, and 
however many steps it takes M to halt on input w, there will be 
a derivation of T(M,w) > E(M,w). 

Naturally, our induction will proceed on the number of steps 
M takes before it reaches a halting configuration. In our induc- 
tive proof, we'll establish that for each step n of the run of M 
on input w, T(M,w) & C(M,w,n), where C(M,w,n) correctly 
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describes the configuration of M run on w after n steps. Now if 
M halts on input w after, say, n steps, C(M,w,n) will describe a 
halting configuration. We'll also show that C(M,w,n) & E(M,w), 
whenever C(M,w,n) describes a halting configuration. So, if 
halts on input w, then for some n, M will be in a halting con- 
figuration after n steps. Hence, 7(M,w) & C(M,w,n) where 
C(M,w,n) describes a halting configuration, and since in that 
case C(M,w,n) § E(M,w), we get that T7(M,w) & E(M,w), ie., 
that — T7(M,w) > E(M,w). 

The strategy for the converse is very different. Here we as- 
sume that t T(M,w)—E(M,w) and have to prove that M halts on 
input w. From the hypothesis we get that 7(M,w) + E(M,w), ie., 
E(M,w) is true in every structure in which 7(M,w) is true. So 
we'll describe a structure M in which 7 (M,w) is true: its domain 
will be N, and the interpretation of all the Qy and S, will be given 
by the configurations of M during a run on input w. So, e.g., 
M § Q,(m,n) iff T, when run on input w for n steps, is in state g 
and scanning square m. Now since T(M,w) § E(M,w) by hy- 
pothesis, and since M & T(M,w) by construction, M + E(M,w). 
But M & E(M,w) iff there is some n € |M| =N so that M, run on 
input w, is in a halting configuration after n steps. 


Definition 15.11. Let C(M,w,n) be the sentence 


Qy(™,7) A So)(0,%) A+++ A So, (k,2) AVX (k < x > Sy(x,7)) 


where q is the state of M at time n, M is scanning square m at 
time n, square i contains symbol o; at time n for 0 < i< k 
and k is the right-most non-blank square of the tape at time 0, 
or the right-most square the tape head has visited after n steps, 
whichever is greater. 
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Proof. Suppose that M halts for input w after n steps. There is 
some state g, square m, and symbol o such that: 


1. After n steps, M is in state g scanning square m on which 7 
appears. 


2. The transition function 6(¢,o) is undefined. 


C(M,w, n) is the description of this configuration and will include 
the clauses Q,(m,n) and S,(m,n). These clauses together imply 
E(M,w): 
Ax ay( \/ (Qy(x,9) A So(x,9))) 
(qo eX 
since Qq(m,n) A So(m,m) & V (goyex(Qg(m,m) A So(m,n)), as 
(q’,0’) EX. Oo 


So if M halts for input w, then there is some n such that 
C(M,w,n) & E(M,w). We will now show that for any time n, 
T (M,w) & C(M,w,n). 


Proof. Induction basis: If 2 = 0, then the conjuncts of C(M,w,0) 
are also conjuncts of 7(M,w), so entailed by it. 

Inductive hypothesis: If M has not halted before the nth 
step, then 7(M,w) & C(M,w,n). We have to show that (un- 
less C(M,w,n) describes a halting configuration), T(M,w) § 
C(M,w,n+1). 

Suppose n > 0 and after n steps, M started on w is in state ¢ 
scanning square m. Since M does not halt after n steps, there 
must be an instruction of one of the following three forms in the 
program of M: 


1. 6(¢,0) = (9’,0’,R) 
2. 6(q,0) = (q’,0’,L) 
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3. (9,0) = (q’,0",N) 
We will consider each of these three cases in turn. 


1. Suppose there is an instruction of the form (1). By Defini- 
tion 15.9(3a), this means that 


Vx Vy ((Qq(x,9) A So(%,9)) > 
(Qy(x,9") A Sor (x,9") A A(x,9))) 


is a conjunct of 7(M,w). This entails the following sen- 
tence (universal instantiation, m for x and 7 for y): 


(Q,(m,n) \ So(m,n)) > 
(Qq(m',n") A Sor(m,n") A A(m,7)). 


By induction hypothesis, T(M,w) & C(M,w,n), ie., 


Qy(™m,7) A So)(0,2) A+++ A So, (kK, MA 
Vx (k <*> S4(x,7)) 


Since after n steps, tape square m contains o, the corre- 
sponding conjunct is S,,(m,n), so this entails: 


We now get 
Qg (m',n') A So (m,n’) A 
Sop (0,2') A-++ A So, (k,2) A 
Vx (k <*> S,(x,77)) 


as follows: The first line comes directly from the conse- 
quent of the preceding conditional, by modus ponens. Each 
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conjunct in the middle line—which excludes S,,,, (mm, 7’ )— 
follows from the corresponding conjunct in C(M,w,n) to- 
gether with A(m,7). 

If m < k, T(Myw) + m < k (Proposition 15.10) and 
by transitivity of <, we have Vx(k < x ™m < x). If 
m = k, then Vx(k < x — ™m < x) by logic alone. The 
last line then follows from the corresponding conjunct in 
C(M,w,n), Vx (k < xm <x), and A(m,n). If m < k, this 
already is C(M,w,n+1). 

Now suppose m = k. In that case, after n+1 steps, the tape 
head has also visited square k + 1, which now is the right- 
most square visited. So C(M,w,n+1) has a new conjunct, 
Su(k ,7’), and the last conjuct is Vx (kK <x S,(x,7’)). 
We have to verify that these two sentences are also implied. 
We already have Vx (k alas Su(x,7’)). In particular, this 
gives us k <k > Suk, 7). From the axiom Vx x < x’ we 
get k < E. By modus ponens, Suk, 7’ ) follows. 

Also, since T(M,w) + k < k , the axiom for transitivity of < 
gives us Vx (kK <x— S,(x,7’)). (We leave the verification 


of this as an exercise.) 


2. Suppose there is an instruction of the form (2). Then, by 
Definition 15.9(3b), 


Vx Vy ((Qg (x,y) A So(¥'.9)) > 

(Qi (%9") A Sot (x',9") A ACH) A 
Vy ((Qq,(0,9) A Sa (0,9)) > 

(Qq)(0,9") A Sar(0,9") A A(0,9))) 
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is a conjunct of 7(M,w). If m > 0, then let / = m—1 (ice., 
m =1+1). The first conjunct of the above sentence entails 
the following: 


(Q, (0,7) A So”) > 
(Qy(E.7") A So(L7’) A ACL) 


Otherwise, let / = m = 0 and consider the following sen- 
tence entailed by the second conjunct: 


((Qy,(0.%) A Sx(0,7)) > 
(Q4,(0,7") A Sar(0,7") A A(0,7))) 


Either sentence implies 
Qy (1,7) A Sor(m,7’) A 
Sop(0, 7’) A+++ A So, (k,0’) A 
Vx (k < «x — Sy(x,77’)) 


as before. (Note that in the first case, 1 =141=mandin 
the second case / = 0.) But this just is C(M,w,n +1). 


3. Case (3) is left as an exercise. 


We have shown that for any n, T(M,w) & C(M,w,n). Oo 


Proof. By Lemma 15.13, we know that, for any time n, the de- 
scription C(M,w,n) of the configuration of M at time n is en- 
tailed by T(M,w). Suppose M halts after k steps. At that 
point, it will be scanning square m, for some m € N. Then 
C(M,w,k) describes a halting configuration of M, i.e., it con- 
tains as conjuncts both Q, (m,k) and S,(m,k) with 6 (q,o°) unde- 
fined. Thus, by Lemma 15.12, C(M,w,k) & E(M,w). But since 
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T (M,w) & C(M,w,k), we have T(M,w) & E(M,w) and therefore 
T (M,w) — E(M,w) is valid. Oo 


To complete the verification of our claim, we also have to 
establish the reverse direction: if 7(M,w) — E(M,w) is valid, 
then M does in fact halt when started on input w. 


Proof. Consider the Ly-structure M with domain N which inter- 
prets o as 0, 7 as the successor function, and < as the less-than 
relation, and the predicates Q, and S, as follows: 


started on w, after n steps, 
M is in state g scanning square m 


Qy" = {(m,n) : 


started on w, after n steps, 


M _ ‘ 
Sa Ama) = square m of M contains symbol 7 


} 

In other words, we construct the structure M so that it describes 
what M started on input w actually does, step by step. Clearly, 
Me T(M,w). If § T(M,w) - E(M,w), then also M & E(M,w), 
i.e., 

Me Aaxay( \/ (Qy(x,y) A So(%,))). 
(go )EX 

As |M| = N, there must be m, n € N so that M & Q,(m,n) A 
S,(m,n) for some g and o such that 6(¢,c) is undefined. By the 
definition of M, this means that M started on input w after n steps 
is in state g and reading symbol co, and the transition function is 
undefined, i.e., M has halted. Oo 


15.8 The Decision Problem is Unsolvable 
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Proof: Suppose the decision problem were solvable, i.e., suppose 
there were a Turing machine D. Then we could solve the halting 
problem as follows. We construct a Turing machine E that, given 
as input the number e of Turing machine ©, and input w, com- 
putes the corresponding sentence T (M,,w)— E(M,,w) and halts, 
scanning the leftmost square on the tape. The machine E ~ D 
would then, given input e and w, first compute 7(M,,w) > 
E(M,,w) and then run the decision problem machine D on that 
input. D halts with output 1 iff 7(M,,w) — E(M,,w) is valid 
and outputs 0 otherwise. By Lemma 15.15 and Lemma 15.14, 
T(M,,w) — E(M,,w) is valid iff M, halts on input w. Thus, 
E ~ D, given input e and w halts with output 1 iff , halts 
on input w and halts with output 0 otherwise. In other words, 
E ~ D would solve the halting problem. But we know, by Theo- 
rem 15.8, that no such Turing machine can exist. Oo 


Proof. Suppose satisfiability were decidable by a Turing ma- 
chine S. Then we could solve the decision problem as follows: 
Given a sentence B as input, move B to the right one square. 
Return to square 1 and write the symbol -. 

Now run the Turing machine S. It eventually halts with output 
either 1 (if 4B is satisfiable) or O (if =B is unsatisfiable) on the 
tape. If there is a J on square 1, erase it; if square 1 is empty, 
write a J, then halt. 

This Turing machine always halts, and its output is 1 iff =B 
is unsatisfiable and 0 otherwise. Since B is valid iff —B is unsatis- 
fiable, the machine outputs 1 iff B is valid, and 0 otherwise, i.e., 
it would solve the decision problem. Oo 
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So there is no Turing machine which always gives a correct 
“yes” or “no” answer to the question “Is B a valid sentence of 
first-order logic?” However, there is a Turing machine that always 
gives a correct “yes” answer—but simply does not halt if the an- 
swer is “no.” This follows from the soundness and completeness 
theorem of first-order logic, and the fact that derivations can be 
effectively enumerated. 


Proof: All possible derivations of first-order logic can be gener- 
ated, one after another, by an effective algorithm. The machine E 
does this, and when it finds a derivation that shows that + B, it 
halts with output 1. By the soundness theorem, if E halts with 
output 1, it’s because —t B. By the completeness theorem, if — B 
there is a derivation that shows that + B. Since EF systematically 
generates all possible derivations, it will eventually find one that 
shows + B, so will eventually halt with output 1. Oo 


15.9 ‘Trakthenbrot’s Theorem 


In section 15.6 we defined sentences T(M,w) and E(M,w) for 
a Turing machine M and input string w. Then we showed in 
Lemma 15.14 and Lemma 15.15 that T(M,w) — E(M,w) is valid 
iff M, started on input w, eventually halts. Since the Halting 
Problem is undecidable, this implies that validity and satisfiability 
of sentences of first-order logic is undecidable (Theorem 15.16 
and Corollary 15.17). 

But validity and satisfiability of sentences is defined for ar- 
bitrary structures, finite or infinite. You might suspect that it is 
easier to decide if a sentence is satisfiable in a finite structure 
(or valid in all finite structures). We can adapt the proof of the 
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unsolvability of the decision problem so that it shows this is not 
the case. 

First, if you go back to the proof of Lemma 15.15, you'll see 
that what we did there is produce a model M of T(M,w) which 
describes exactly what machine M does when started on input w. 
The domain of that model was N, i.e., infinite. But if 7 actually 
halts on input w, we can build a finite model M’ in the same 
way. Suppose M started on input w halts after k steps. Take as 
domain |M’| the set {0,...,}, where n is the larger of k and the 
length of w, and let 


rey ifx<n 


n otherwise, 


and (x,y) € <™’ iff x < y or x = y = m. Otherwise M’ is defined 
just like M. By the definition of M’, just like in the proof of 
Lemma 15.15, M’ & T(M,w). And since we assumed that M halts 
on input w, M’ & E(M,w). So, M’ is a finite model of T(M,w) A 
E(M,w) (note that we’ve replaced — with A). 

We are halfway to a proof: we’ve shown that if M halts on 
input w, then T(M,e) \E(M,w) has a finite model. Unfortunately, 
the “only if” direction does not hold. For instance, if M after 
n steps is in state g and reads a symbol co, and 6(q¢,7) = (q,.0,N), 
then the configuration after n +1 steps is exactly the same as the 
configuration after n steps (same state, same head position, same 
tape contents). But the machine never halts; it’s in an infinite 
loop. The corresponding structure M’ above satisfies T(M,w) 
but not E(M,w). (In it, the values of n+ are all the same, so it 
is finite). But by changing 7'(M,w) in a suitable way we can rule 
out structures like this. 

Consider the sentences describing the operation of the Turing 


machine M on input w = 0;,...7%,: 


1. Axioms describing numbers and < (just like in the defini- 
tion of T(M,w) in section 15.6). 
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2. Axioms describing the input configuration: just like in the 
definition of T(M,w). 


3. Axioms describing the transition from one configuration to 
the next: 


For the following, let A(x,y) be as before, and let 
Biy) =Vx(x<yroxFy). 


a) For every instruction 6(¢;,,0) = (q;,0’,R), the sen- 
tence: 


Vx Vy ((Qg,(%,9) A So (%,9)) > 
(Qg, (7,9) A Sor (x y") A A(x.) A BY’) 


b) For every instruction 6(¢;,0) = (q;,0’,L), the sen- 
tence 


Vx Wy ((Qg,(*',9) A So (%',9)) 9 
(Qq,(x,9") A Sor (x", 9") A A(x,9))) A 
Vy ((Qq,(0,9) A So (0,9)) > 
(Qy,(0,9') A Sor (0,9") A A(0,9) A BY’))) 


c) For every instruction 6(¢;,07) = (q;,0’,N), the sen- 
tence: 


Vx Wy ((Qq,(%9) A So(*,9)) > 
(Qq,(x.9') A Sor (x,9") A A(x.) A BY’) 


As you can see, the sentences describing the transi- 
tions of M are the same as the corresponding sentence 
in T(M,w), except we add B(y’) at the end. B(y’) ensures 
that the number y’ of the “next” configuration is different 
from all previous numbers 0, 0’, .... 


Let 7’(M,w) be the conjunction of all the above sentences for 
Turing machine M and input w. 
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Proof. Let M’ be as in the proof of Lemma 15.15, except 


|M’| = {0,...,n}, 
M( ) x+1 ifx<n 
x)= 
n otherwise, 


(x,y) € <M iff x < y or x=y=X2, 


where n = max(k,len(w)) and & is the least number such that 
M started on input w has halted after k steps. We leave the 
verification that M’ & T’(M,w) A E(M,w) as an exercise. Oo 


Proof: We show the contrapositive. Suppose that M started on w 
does not halt. If 7’(M,w) A E(M,w) has no model at all, we are 
done. So assume M is a model of 7 (M,w) A E(M,w). We have 
to show that it cannot be finite. 

We can prove, just like in Lemma 15.13, that if M, started 
on input w, has not halted after n steps, then T’(M,w) §& 
C(M,w,n) A B(n). Since M started on input w does not halt, 
T’(M,w) §& C(M,w,n) A Bn) for all n € N. Note that by 
Proposition 15.10, T’(M,w) & k < 7 for all k < n. Also 
Ba) ek <n->k #7. So. MEK # 7 forall k < n, ie, 
the infinitely many terms & must all have different values in M. 
But this requires that |M| be infinite, so M cannot be a finite 
model of 7’(M,w) A E(M,w). Oo 
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Proof. Suppose there were a Turing machine F that decides the 
finite satisfiability problem. Then given any Turing machine M 
and input w, we could compute the sentence T’(M,w) \E(M,w), 
and use F to decide if it has a finite model. By Lemmas 15.19 
and 15.20, it does iff M started on input w halts. So we could use 
F to solve the halting problem, which we know is unsolvable. O 


Proof. Exercise. Oo 


Summary 


Turing machines are determined by their instruction sets, which 
are finite sets of quintuples (for every state and symbol read, spec- 
ify new state, symbol written, and movement of the head). The 
finite sets of quintuples are enumerable, so there is a way of as- 
sociating a number with each Turing machine instruction set. 
The index of a Turing machine is the number associated with 
its instruction set under a fixed such schema. In this way we can 
“talk about” Turing machines indirectly—by talking about their 
indices. 

One important problem about the behavior of Turing ma- 
chines is whether they eventually halt. Let A(e,n) be the func- 
tion which = 1 if the Turing machine with index e halts when 
started on input n, and = 0 otherwise. It is called the halting 
function. The question of whether the halting function is itself 
Turing computable is called the halting problem. The answer is 
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no: the halting problem is unsolvable. This is established using 
a diagonal argument. 

The halting problem is only one example of a larger class 
of problems of the form “can X be accomplished using Turing 
machines.” Another central problem of logic is the decision 
problem for first-order logic: is there a Turing machine that 
can decide if a given sentence is valid or not. This famous prob- 
lem was also solved negatively: the decision problem is unsolv- 
able. This is established by a reduction argument: we can asso- 
ciate with each Turing machine M and input w a first-order sen- 
tence T(M,w) — E(M,w) which is valid iff / halts when started 
on input w. If the decision problem were solvable, we could thus 
use it to solve the halting problem. 


Problems 


Problem 15.1. Can you think of a way to describe Turing ma- 
chines that does not require that the states and alphabet symbols 
are explicitly listed? You may define your own notion of “stan- 
dard” machine, but say something about why every Turing ma- 
chine can be computed by a “standard” machine in your new 
sense. 


Problem 15.2. The Three Halting (3-Halt) problem is the prob- 
lem of giving a decision procedure to determine whether or not 
an arbitrarily chosen Turing Machine halts for an input of three 
I’s on an otherwise blank tape. Prove that the 3-Halt problem is 
unsolvable. 


Problem 15.3. Show that if the halting problem is solvable for 
Turing machine and input pairs M, and n where e # 2, then it is 
also solvable for the cases where ¢ = n. 


Problem 15.4. We proved that the halting problem is unsolvable 
if the input is a number e, which identifies a Turing machine M, 
via an enumaration of all Turing machines. What if we allow 
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the description of Turing machines from section 15.2 directly as 
input? Can there be a Turing machine which decides the halting 
problem but takes as input descriptions of Turing machines rather 
than indices? Explain why or why not. 


Problem 15.5. Show that the partial function s’ is defined as 


(e) 1 if machine M, halts for input e 
s‘(e) = 
undefined if machine M, does not halt for input ¢ 


is Turing computable. 


Problem 15.6. Prove Proposition 15.10. (Hint: use induction on 
k-m). 


Problem 15.7. Complete case (3) of the proof of Lemma 15.13. 


Problem 15.8. Give a derivation of S,,(i,7’) from S,,(i,”) and 
A(m,n) (assuming i # m, i.e., either i < m or m < i). 


Problem 15.9. Give a derivation of Vx (k < x S,(x,n’)) from 
Vx(k < x — Sy(x,n’)), Vax < x’, and WxVyVz((x < yA < 
Z) 7x <2z).) 


Problem 15.10. Complete the proof of Lemma 15.19 by proving 
that M’ — T(M,w) A E(M,w). 


Problem 15.11. Complete the proof of Lemma 15.20 by proving 
that if M, started on input w, has not halted after n steps, then 
T’(M,w) & B(n). 


Problem 15.12. Prove Corollary 15.22. Observe that B is sat- 
isfied in every finite structure iff —B is not finitely satisfiable. 
Explain why finite satisfiability is semi-decidable in the sense of 
Theorem 15.18. Use this to argue that if there were a derivation 
system for finite validity, then finite satisfiability would be decid- 
able. 


APPENDIX A 


Proofs 


A.1. Introduction 


Based on your experiences in introductory logic, you might be 
comfortable with a derivation system—probably a natural de- 
duction or Fitch style derivation system, or perhaps a proof-tree 
system. You probably remember doing proofs in these systems, 
either proving a formula or show that a given argument is valid. 
In order to do this, you applied the rules of the system until you 
got the desired end result. In reasoning about logic, we also prove 
things, but in most cases we are not using a derivation system. In 
fact, most of the proofs we consider are done in English (perhaps, 
with some symbolic language thrown in) rather than entirely in 
the language of first-order logic. When constructing such proofs, 
you might at first be at a loss—how do I prove something without 
a derivation system? How do I start? How do I know if my proof 
is correct? 

Before attempting a proof, it’s important to know what a proof 
is and how to construct one. As implied by the name, a proof is 
meant to show that something is true. You might think of this in 
terms of a dialogue—someone asks you if something is true, say, 
if every prime other than two is an odd number. To answer “yes” 
is not enough; they might want to know why. In this case, youd 
give them a proof. 
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In everyday discourse, it might be enough to gesture at an 
answer, or give an incomplete answer. In logic and mathematics, 
however, we want rigorous proof—we want to show that some- 
thing is true beyond any doubt. This means that every step in our 
proof must be justified, and the justification must be cogent (ie., 
the assumption youre using is actually assumed in the statement 
of the theorem youre proving, the definitions you apply must be 
correctly applied, the justifications appealed to must be correct 
inferences, etc.). 

Usually, we’re proving some statement. We call the statements 
we're proving by various names: propositions, theorems, lemmas, 
or corollaries. A proposition is a basic proof-worthy statement: 
important enough to record, but perhaps not particularly deep 
nor applied often. A theorem is a significant, important proposi- 
tion. Its proof often is broken into several steps, and sometimes it 
is named after the person who first proved it (e.g., Cantor’s The- 
orem, the Léwenheim-Skolem theorem) or after the fact it con- 
cerns (e.g., the completeness theorem). A lemma is a proposition 
or theorem that is used in the proof of a more important result. 
Confusingly, sometimes lemmas are important results in them- 
selves, and also named after the person who introduced them 
(e.g., Zorn’s Lemma). A corollary is a result that easily follows 
from another one. 

A statement to be proved often contains assumptions that 
clarify which kinds of things we’re proving something about. It 
might begin with “Let A be a formula of the form B — C” or 
“Suppose J’ + A” or something of the sort. These are hypotheses of 
the proposition, theorem, or lemma, and you may assume these 
to be true in your proof. They restrict what we’re proving, and 
also introduce some names for the objects we’re talking about. 
For instance, if your proposition begins with “Let A be a formula 
of the form B— C,” youre proving something about all formulas 
of a certain sort only (namely, conditionals), and it’s understood 
that B — C is an arbitrary conditional that your proof will talk 
about. 
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A.2 Starting a Proof 


But where do you even start? 

You’ve been given something to prove, so this should be the 
last thing that is mentioned in the proof (you can, obviously, an- 
nounce that youre going to prove it at the beginning, but you don’t 
want to use it as an assumption). Write what you are trying to 
prove at the bottom of a fresh sheet of paper—this way you don’t 
lose sight of your goal. 

Next, you may have some assumptions that you are able to use 
(this will be made clearer when we talk about the type of proof you 
are doing in the next section). Write these at the top of the page 
and make sure to flag that they are assumptions (i.e., if you are 
assuming fp, write “assume that p,” or “suppose that p”). Finally, 
there might be some definitions in the question that you need 
to know. You might be told to use a specific definition, or there 
might be various definitions in the assumptions or conclusion 
that you are working towards. Write these down and ensure that you 
understand what they mean. 

How you set up your proof will also be dependent upon the 
form of the question. The next section provides details on how 
to set up your proof based on the type of sentence. 


A.3 Using Definitions 


We mentioned that you must be familiar with all definitions that 
may be used in the proof, and that you can properly apply them. 
This is a really important point, and it is worth looking at in 
a bit more detail. Definitions are used to abbreviate properties 
and relations so we can talk about them more succinctly. The 
introduced abbreviation is called the definiendum, and what it ab- 
breviates is the definiens. In proofs, we often have to go back to 
how the definiendum was introduced, because we have to exploit 
the logical structure of the definiens (the long version of which 
the defined term is the abbreviation) to get through our proof. By 
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unpacking definitions, you’re ensuring that you're getting to the 
heart of where the logical action is. 

We'll start with an example. Suppose you want to prove the 
following: 


In order to even start the proof, we need to know what it 
means for two sets to be identical; i.e., we need to know what 
the “=” in that equation means for sets. Sets are defined to be 
identical whenever they have the same elements. So the definition 
we have to unpack is: 


Definition A.2. Sets A and B are identical, A = B, iff every ele- 


ment of A is an element of B, and vice versa. 


This definition uses A and B as placeholders for arbitrary sets. 
What it defines—the definiendum—is the expression “A = B” by 
giving the condition under which A = B is true. This condition— 
“every element of A is an element of B, and vice versa”—is the 
definiens.' The definition specifies that A = B is true if, and only 
if (we abbreviate this to “iff”) the condition holds. 

When you apply the definition, you have to match the A and 
B in the definition to the case you're dealing with. In our case, it 
means that in order for dU B = BUA to be true, each z € AUB 
must also be in BUA, and vice versa. The expression AUB in the 
proposition plays the role of A in the definition, and B U A that 
of B. Since A and B are used both in the definition and in the 
statement of the proposition we’re proving, but in different uses, 
you have to be careful to make sure you don’t mix up the two. 
For instance, it would be a mistake to think that you could prove 
the proposition by showing that every element of A is an element 


In this particular case—and very confusingly!—when A = B, the sets A 
and B are just one and the same set, even though we use different letters for it 
on the left and the right side. But the ways in which that set is picked out may 
be different, and that makes the definition non-trivial. 
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of B, and vice versa—that would show that A = B, not that AUB = 
BUA. (Also, since A and B may be any two sets, you won’t get 
very far, because if nothing is assumed about A and B they may 
well be different sets.) 

Within the proof we are dealing with set-theoretic notions 
such as union, and so we must also know the meanings of the 
symbol U in order to understand how the proof should pro- 
ceed. And sometimes, unpacking the definition gives rise to 
further definitions to unpack. For instance, A U B is defined as 
{z:z € Aorz € B}. So if you want to prove that x « AU B, 
unpacking the definition of U tells you that you have to prove 
x € {z:z €Aorz € B}. Now you also have to remember that 
xe{z:...z...} iff...x%.... So, further unpacking the definition 
of the {z:...z...} notation, what you have to show is: x € A or 
x € B. So, “every element of A U B is also an element of BU A” 
really means: “for every x, if x € A or x € B, then x € B or 
x € A.” If we fully unpack the definitions in the proposition, we 
see that what we have to show is this: 


What’s important is that unpacking definitions is a necessary 
part of constructing a proof. Properly doing it is sometimes diffi- 
cult: you must be careful to distinguish and match the variables 
in the definition and the terms in the claim you’re proving. In 
order to be successful, you must know what the question is ask- 
ing and what all the terms used in the question mean—you will 
often need to unpack more than one definition. In simple proofs 
such as the ones below, the solution follows almost immediately 
from the definitions themselves. Of course, it won’t always be this 
simple. 
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A.4 Inference Patterns 


Proofs are composed of individual inferences. When we make an 
inference, we typically indicate that by using a word like “so,” 
“thus,” or “therefore.” The inference often relies on one or two 
facts we already have available in our proof—it may be something 
we have assumed, or something that we’ve concluded by an in- 
ference already. To be clear, we may label these things, and in 
the inference we indicate what other statements we’re using in the 
inference. An inference will often also contain an explanation of 
why our new conclusion follows from the things that come before 
it. There are some common patterns of inference that are used 
very often in proofs; we'll go through some below. Some patterns 
of inference, like proofs by induction, are more involved (and will 
be discussed later). 

We’ve already discussed one pattern of inference: unpack- 
ing, or applying, a definition. When we unpack a definition, we 
just restate something that involves the definiendum by using the 
definiens. For instance, suppose that we have already established 
in the course of a proof that D = E (a). Then we may apply the 
definition of = for sets and infer: “Thus, by definition from (a), 
every element of D is an element of E£ and vice versa.” 

Somewhat confusingly, we often do not write the justification 
of an inference when we actually make it, but before. Suppose 
we haven’t already proved that D = E, but we want to. If D= E 
is the conclusion we aim for, then we can restate this aim also 
by applying the definition: to prove D = E we have to prove 
that every element of D is an element of E and vice versa. So 
our proof will have the form: (a) prove that every element of D 
is an element of E; (b) every element of E is an element of D; 
(c) therefore, from (a) and (b) by definition of =, D = E. But 
we would usually not write it this way. Instead we might write 
something like, 
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We want to show D = E. By definition of =, this 
amounts to showing that every element of D is an el- 
ement of E and vice versa. 


(a) ...(a proof that every element of D is an element 
of E)... 


(b) ... (a proof that every element of F is an element 
of D)... 


Using a Conjunction 


Perhaps the simplest inference pattern is that of drawing as con- 
clusion one of the conjuncts of a conjunction. In other words: 
if we have assumed or already proved that p and q, then we’re 
entitled to infer that (and also that g). This is such a basic 
inference that it is often not mentioned. For instance, once we’ve 
unpacked the definition of D = E we’ve established that every 
element of D is an element of E and vice versa. From this we can 
conclude that every element of EF is an element of D (that’s the 
“vice versa” part). 


Proving a Conjunction 


Sometimes what you'll be asked to prove will have the form of a 
conjunction; you will be asked to “prove p and q.” In this case, 
you simply have to do two things: prove fp, and then prove g. You 
could divide your proof into two sections, and for clarity, label 
them. When you're making your first notes, you might write “(1) 
Prove p” at the top of the page, and “(2) Prove q” in the middle of 
the page. (Of course, you might not be explicitly asked to prove 
a conjunction but find that your proof requires that you prove a 
conjunction. For instance, if you’re asked to prove that D = E 
you will find that, after unpacking the definition of =, you have to 
prove: every element of D is an element of E and every element 
of £ is an element of D). 
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Proving a Disjunction 


When what you are proving takes the form of a disjunction (i.e., it 
is an statement of the form “f or q”), it is enough to show that one 
of the disjuncts is true. However, it basically never happens that 
either disjunct just follows from the assumptions of your theorem. 
More often, the assumptions of your theorem are themselves dis- 
junctive, or you're showing that all things of a certain kind have 
one of two properties, but some of the things have the one and 
others have the other property. This is where proof by cases is 
useful (see below). 


Conditional Proof 


Many theorems you will encounter are in conditional form (i.e., 
show that if p holds, then q is also true). These cases are nice and 
easy to set up—simply assume the antecedent of the conditional 
(in this case, ) and prove the conclusion g from it. So if your 
theorem reads, “If p then g,” you start your proof with “assume 
p” and at the end you should have proved q. 

Conditionals may be stated in different ways. So instead of “If 
p then g,” a theorem may state that “p only if ¢,” “q if p,” or “gq, 
provided p.” These all mean the same and require assuming p 
and proving g from that assumption. Recall that a biconditional 
(“p if and only if (iff) q”) is really two conditionals put together: 
if p then g, and if q then p. All you have to do, then, is two 
instances of conditional proof: one for the first conditional and 
another one for the second. Sometimes, however, it is possible 
to prove an “iff” statement by chaining together a bunch of other 
“iff statements so that you start with “p” an end with “¢”—but 
in that case you have to make sure that each step really is an “iff.” 


Universal Claims 


Using a universal claim is simple: if something is true for any- 
thing, it’s true for each particular thing. So if, say, the hypothesis 
of your proof is A C B, that means (unpacking the definition 
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of C), that, for every x € A, x € B. Thus, if you already know 
that z € A, you can conclude z € B. 

Proving a universal claim may seem a little bit tricky. Usually 
these statements take the following form: “If x has P, then it 
has Q” or “All Ps are Qs.” Of course, it might not fit this form 
perfectly, and it takes a bit of practice to figure out what you're 
asked to prove exactly. But: we often have to prove that all objects 
with some property have a certain other property. 

The way to prove a universal claim is to introduce names 
or variables, for the things that have the one property and then 
show that they also have the other property. We might put this 
by saying that to prove something for all Ps you have to prove 
it for an arbitrary P. And the name introduced is a name for an 
arbitrary P. We typically use single letters as these names for 
arbitrary things, and the letters usually follow conventions: e.g., 
we use 7 for natural numbers, A for formulas, A for sets, f for 
functions, etc. 

The trick is to maintain generality throughout the proof. You 
start by assuming that an arbitrary object (“x”) has the prop- 
erty P, and show (based only on definitions or what you are al- 
lowed to assume) that x has the property Q. Because you have 
not stipulated what x is specifically, other that it has the property 
P, then you can assert that all every P has the property Q. In 
short, x is a stand-in for all things with property P. 


Proof. Let A and B be arbitrary sets. We want to show that A C 
AUB. By definition of C, this amounts to: for every x, if x € A 
then x €¢ AUB. So let x € A be an arbitrary element of A. We 
have to show that x € AU B. Since x € A, x € A or x € B. Thus, 
xe {x:x €AVx © B}. But that, by definition of U, means 
xe€AVUB. Oo 
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Proof by Cases 


Suppose you have a disjunction as an assumption or as an already 
established conclusion—you have assumed or proved that f or ¢ 
is true. You want to prove 7. You do this in two steps: first you 
assume that p is true, and prove r, then you assume that q is true 
and prove r again. This works because we assume or know that 
one of the two alternatives holds. The two steps establish that 
either one is sufficient for the truth of r. (If both are true, we 
have not one but two reasons for why r is true. It is not neces- 
sary to separately prove that 7 is true assuming both # and q.) 
To indicate what we’re doing, we announce that we “distinguish 
cases.” For instance, suppose we know that x ¢ BUC. BUC is 
defined as {x : x € Bor x € C}. In other words, by definition, 
x € Bor x € C. We would prove that x € A from this by first 
assuming that x € B, and proving x € A from this assumption, 
and then assume x € C, and again prove x € A from this. You 
would write “We distinguish cases” under the assumption, then 
“Case (1): x € B” underneath, and “Case (2): x € C halfway 
down the page. Then youd proceed to fill in the top half and the 
bottom half of the page. 

Proof by cases is especially useful if what you're proving is 
itself disjunctive. Here’s a simple example: 


Proof. Assume (a) that B C D and (b) C C E. By definition, any 
x € Bis also € D (c) and any x € C is also € E (d). To show that 
BUC C DUE, we have to show that if x € BUC then x € DUE 
(by definition of C). x ¢ BUC iff x « Bor x € C (by definition 
of U). Similarly, x ¢ DU E iff x € D or x € E. So, we have to 
show: for any x, if x ¢ Borxe¢C,thenx ¢ Dorxe EL. 


So far we’ve only unpacked definitions! We’ve refor- 
mulated our proposition without C and U and are left 
with trying to prove a universal conditional claim. By 
what we’ve discussed above, this is done by assuming 


APPENDIX A. PROOFS 343 


that x is something about which we assume the “if” 
part is true, and we'll go on to show that the “then” 
part is true as well. In other words, we'll assume that 
x € B or x € C and show that x € D or x € E.* 


Suppose that x € B or x € C. We have to show that x € D or 
x € E, We distinguish cases. 

Case 1: x € B. By (c), x € D. Thus, x € Dor x € E. (Here 
we've made the inference discussed in the preceding subsection!) 

Case 2: x € C. By (d), x € EF. Thus, xe Dorxe E. Oo 


Proving an Existence Claim 


When asked to prove an existence claim, the question will usually 
be of the form “prove that there is an x such that ...x...”, ie., 
that some object that has the property described by “...x...”. In 
this case you'll have to identify a suitable object show that is has 
the required property. This sounds straightforward, but a proof 
of this kind can be tricky. ‘Typically it involves constructing or 
defining an object and proving that the object so defined has the 
required property. Finding the right object may be hard, proving 
that it has the required property may be hard, and sometimes it’s 
even tricky to show that you’ve succeeded in defining an object 
at all! 

Generally, youd write this out by specifying the object, e.g., 
“let x be...” (where... specifies which object you have in mind), 
possibly proving that ... in fact describes an object that exists, 
and then go on to show that x has the property Q. Here’s a simple 
example. 


Proof. Assume x € B. Let A= {x}. 


°This paragraph just explains what we’re doing—it’s not part of the proof, 
and you don’t have to go into all this detail when you write down your own 
proofs. 
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Here we’ve defined the set A by enumerating its ele- 
ments. Since we assume that x is an object, and we 
can always form a set by enumerating its elements, 
we don’t have to show that we’ve succeeded in defin- 
ing a set A here. However, we still have to show that 
A has the properties required by the proposition. The 
proof isn’t complete without that! 


Since x € A, A#9@. 


This relies on the definition of A as {x} and the ob- 
vious facts that x € {x} and x ¢ 0. 


Since x is the only element of {x}, and x € B, every element of A 
is also an element of B. By definition of C, A C B. Oo 


Using Existence Claims 


Suppose you know that some existence claim is true (you’ve 
proved it, or it’s a hypothesis you can use), say, “for some x, 
x € A” or “there is an x € A.” If you want to use it in your proof, 
you can just pretend that you have a name for one of the things 
which your hypothesis says exist. Since A contains at least one 
thing, there are things to which that name might refer. You might 
of course not be able to pick one out or describe it further (other 
than that it is € A). But for the purpose of the proof, you can 
pretend that you have picked it out and give a name to it. It’s 
important to pick a name that you haven’t already used (or that 
appears in your hypotheses), otherwise things can go wrong. In 
your proof, you indicate this by going from “for some x, x € A” 
to “Let a € A.” Now you can reason about a, use some other hy- 
potheses, etc., until you come to a conclusion, p. If p no longer 
mentions a, p is independent of the asusmption that a € A, and 
you’ve shown that it follows just from the assumption “for some 
x, x € A.” 
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Proof. Suppose A # 9. So for some x, x € A. 


Here we first just restated the hypothesis of the propo- 
sition. This hypothesis, ie., A # 0, hides an existen- 
tial claim, which you get to only by unpacking a few 
definitions. The definition of = tells us that A = 0 iff 
every x € A is also € 0 and every x € @ is also € A. 
Negating both sides, we get: A # 0 iff either some 
x € Ais ¢ 0 or some x € 0 is ¢ A. Since nothing is 
€ @, the second disjunct can never be true, and “x € A 
and x ¢ Q” reduces to just x ¢ A. So x # 0 iff for some 
x, x € A. That’s an existence claim. Now we use that 
existence claim by introducing a name for one of the 
elements of A: 


Let aeéA. 


Now we've introduced a name for one of the things € 
A, We'll continue to argue about a, but we'll be care- 
ful to only assume that a € A and nothing else: 


Since a € A, a € AUB, by definition of U. So for some x, x € AUB, 
ie, AUB FQ. 


In that last step, we went from “a ¢ AU B” to “for 
some x, x € AUB.” That doesn’t mention a anymore, 
so we know that “for some x, x € AU B” follows 
from “for some x, x € A alone.” But that means that 
AUB#FO. Oo 


om 


It’s maybe good practice to keep bound variables like “x” sep- 
arate from hypothetical names like a, like we did. In practice, 
however, we often don’t and just use x, like so: 


Suppose A # 0, ie., there is an x € A. By definition 
of U,x€ AUB. SoAUB#O. 
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However, when you do this, you have to be extra careful that 
you use different x’s and y’s for different existential claims. For 
instance, the following is not a correct proof of “If A # 0 and 
B#Q then AN B # ®” (which is not true). 


Suppose A # @ and B # 0. So for some x, x € A 
and also for some x, x € B. Since x € Aand x € B, 
x € ANB, by definition of N. So AN B #90. 


Can you spot where the incorrect step occurs and explain why 
the result does not hold? 


A.5 An Example 


Our first example is the following simple fact about unions and in- 
tersections of sets. It will illustrate unpacking definitions, proofs 


of conjunctions, of universal claims, and proof by cases. 


Let’s prove it! 


Proof. We want to show that for any sets A, B, and C, AU(BNC) = 
(AUB)N (AUC) 


First we unpack the definition of “=” in the statement 
of the proposition. Recall that proving sets identical 
means showing that the sets have the same elements. 
That is, all elements of AU(BNC) are also elements 
of (AUB)N(AUC), and vice versa. The “vice versa” 
means that also every element of (A U B)N (AU C) 
must be an element of AU( BNC). So in unpacking the 
definition, we see that we have to prove a conjunction. 
Let’s record this: 
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By definition, AU (BNC) = (AU B)N (AU C) iff every element 
of AU(BNC) is also an element of (AU B) N (AUC), and every 
element of (AU B)N (AUC) is an element of AU(BNC). 


Since this is a conjunction, we must prove each con- 
junct separately. Lets start with the first: let’s prove 
that every element of AU (BN C) is also an element 
of (AU B)N(AUC). 


This is a universal claim, and so we consider an ar- 
bitrary element of AU (BNC) and show that it must 
also be an element of (AU B)N (AUC). We'll pick a 
variable to call this arbitrary element by, say, z. Our 
proof continues: 


First, we prove that every element of AU(BNC) is also an element 
of (AU B)N (AUC). Let ze AU(BNC). We have to show that 
z€(AUB)N(AUC). 


Now it is time to unpack the definition of U and /. 
For instance, the definition of U is: AU B = {z: 
z € Aorz € B}. When we apply the definition to 
“AU(BNC),” the role of the “B” in the definition 
is now played by “BNC,” so AU (BNC) = {z: 
z € Aorz € BNC}. So our assumption that z ¢€ 
AU(BNC) amounts to: z€ {z:z¢€ Aorze BNC}. 
And z € {z:...z...} iff...z..., ie. in this case, 
zeAorze BNC. 


By the definition of U, either z €¢ Aor ze BNC. 


Since this is a disjunction, it will be useful to apply 
proof by cases. We take the two cases, and show that 
in each one, the conclusion we’re aiming for (namely, 
“z € (AU B)N (AU C)”) obtains. 


Case 1: Suppose that z € A. 
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There’s not much more to work from based on our 
assumptions. So let’s look at what we have to work 
with in the conclusion. We want to show that z € 
(AU B)N (AUC). Based on the definition of A, if 
we want to show that z € (AU B)N (AUC), we have 
to show that it’s in both (A U B) and (AUC). But 
z € AUB iff z ¢ Aor z € B, and we already have 
(as the assumption of case 1) that z € A. By the 
same reasoning—switching C for B—z € AUC. This 
argument went in the reverse direction, so let’s record 
our reasoning in the direction needed in our proof. 


Since z € A, z € Aor z € B, and hence, by definition of U, z € AU 
B. Similarly, z ¢ AUC. But this means that z € (AUB)N (AUC), 
by definition of /N. 


This completes the first case of the proof by cases. 
Now we want to derive the conclusion in the second 
case, where z € BNC. 


Case 2: Suppose that z « BNC. 


Again, we are working with the intersection of two 
sets. Let’s apply the definition of N: 


Since z € BNC, z must be an element of both B and C, by 
definition of M. 


It’s time to look at our conclusion again. We have to 
show that z is in both (AUB) and (AUC). And again, 
the solution is immediate. 


Since z € B, z € (AUB). Since z € C, also z € (AUC). So, 
z€(AUB)N(AUC). 


Here we applied the definitions of U and /O again, 
but since we’ve already recalled those definitions, and 
already showed that if z is in one of two sets it is in 


APPENDIX A. PROOFS 349 


their union, we don’t have to be as explicit in what 
we've done. 


We’ve completed the second case of the proof by 
cases, sO now we can assert our first conclusion. 


So, if z€ AU(BNC) then z € (AUB)N (AUC). 


Now we just want to show the other direction, that 
every element of (AU B) N (AUC) is an element of 
AU(BNC). As before, we prove this universal claim 
by assuming we have an arbitrary element of the first 
set and show it must be in the second set. Let’s state 
what we’re about to do. 


Now, assume that z € (AU B) 1 (AUC). We want to show that 
zE€AU(BNC). 


We are now working from the hypothesis that z € 
(AU B)N (AUC). It hopefully isn’t too confusing 
that we’re using the same z here as in the first part 
of the proof. When we finished that part, all the as- 
sumptions we’ve made there are no longer in effect, 
so now we can make new assumptions about what z 
is. If that is confusing to you, just replace z with a 
different variable in what follows. 


We know that z is in both AU B and AUC, by defini- 
tion of N. And by the definition of U, we can further 
unpack this to: either z € A or z € B, and also either 
z € Aorz € C. This looks like a proof by cases 
again—except the “and” makes it confusing. You 
might think that this amounts to there being three 
possibilities: z is either in A, B or C. But that would 
be a mistake. We have to be careful, so let’s consider 
each disjunction in turn. 


By definition of N, z ¢ AU Band z € AUC. By definition of U, 
z€Aorze€B. We distinguish cases. 
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Since were focusing on the first disjunction, we 
haven’t gotten our second disjunction (from unpack- 
ing AUC) yet. In fact, we don’t need it yet. The 
first case is z € A, and an element of a set is also 
an element of the union of that set with any other. So 
case 1 is easy: 


Case 1: Suppose that z € A. It follows that z¢ AU(BNC). 


Now for the second case, z € B. Here we'll unpack 
the second U and do another proof-by-cases: 


Case 2: Suppose that z € B. Since z € AUC, either z € A or 
z € C. We distinguish cases further: 
Case 2a: z € A. Then, again, z€ AU(BNC). 


Ok, this was a bit weird. We didn’t actually need the 
assumption that z € B for this case, but that’s ok. 


Case 2b: z € C. Then z € Band z € C, soz € BNC, and 
consequently, z ¢ AU(BNC). 


This concludes both proofs-by-cases and so we're 
done with the second half. 


So, if z€ (AU B)N (AUC) then z€ AU(BNC). Oo 


A.6 Another Example 


Proof: Suppose that A C C’. We want to show that AU(C\ A) =C. 


We begin by observing that this is a conditional state- 
ment. It is tacitly universally quantified: the proposi- 
tion holds for all sets A and C. So A and C are vari- 
ables for arbitrary sets. To prove such a statement, 
we assume the antecedent and prove the consequent. 
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We continue by using the assumption that A C C. 
Let’s unpack the definition of C: the assumption 
means that all elements of A are also elements of C. 
Let’s write this down—it’s an important fact that we'll 
use throughout the proof. 


By the definition of C, since A C C, for all z, if z € A, then z € C. 


We’ve unpacked all the definitions that are given to 
us in the assumption. Now we can move onto the 
conclusion. We want to show that A U(C \ A) = C, 
and so we set up a proof similarly to the last example: 
we show that every element of A U (C \ A) is also 
an element of C and, conversely, every element of C 
is an element of AU (C \ A). We can shorten this to: 
AU(C\A) € Cand C CC AU(C \ A). (Here we’re 
doing the opposite of unpacking a definition, but it 
makes the proof a bit easier to read.) Since this is a 
conjunction, we have to prove both parts. To show the 
first part, i.e., that every element of AU(C \ A) is also 
an element of C’, we assume that z € AU (C \ A) for 
an arbitrary z and show that z € C. By the definition 
of U, we can conclude that z € A or z € C \ A from 
z€AU(C \ A). You should now be getting the hang 
of this. 


AU(C\ A) =C iff AU(C \ A) CC andC C (AU (C \ A). First 
we prove that AU (C \ A) C C. Let z € AU(C \ A). So, either 
z€Aorze(C\A). 


We’ve arrived at a disjunction, and from it we want 
to prove that z € C. We do this using proof by cases. 


Case 1: z € A. Since for all z, if z € A, z € C, we have that z € C. 


Here we’ve used the fact recorded earlier which fol- 
lowed from the hypothesis of the proposition that 
ACC. The first case is complete, and we turn to 
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the second case, z € (C' \ A). Recall that C \ A de- 
notes the difference of the two sets, i.e., the set of all 
elements of C’ which are not elements of A. But any 
element of C not in A is in particular an element of C. 


Case 2: z € (C \ A). This means that z € C and z ¢ A. So, in 
particular, z € C. 


Great, we’ve proved the first direction. Now for the 
second direction. Here we prove that C C AU(C\ A). 
So we assume that z € C and prove that z € AU(C \ 
A). 


Now let z € C. We want to show that z € Aor ze C\ A. 


Since all elements of A are also elements of C’, and 
C \ Ais the set of all things that are elements of C but 
not A, it follows that z is either in A or in C \ A. This 
may be a bit unclear if you don’t already know why 
the result is true. It would be better to prove it step- 
by-step. It will help to use a simple fact which we can 
state without proof: z € A or z ¢ A. This is called the 
“principle of excluded middle:” for any statement #, 
either p is true or its negation is true. (Here, p is the 
statement that z € A.) Since this is a disjunction, we 
can again use proof-by-cases. 


Either z € A or z ¢ A. In the former case, z € AU(C \ A). In the 
latter case, z € Cand z ¢ A, so z € C\A. But then z € AU(C'\ A). 


Our proof is complete: we have shown that AU (C' \ 
A) =C. Oo 


A.7 Proof by Contradiction 


In the first instance, proof by contradiction is an inference pat- 
tern that is used to prove negative claims. Suppose you want to 
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show that some claim / is false, i.e., you want to show =p. The 
most promising strategy is to (a) suppose that p is true, and (b) 
show that this assumption leads to something you know to be 
false. “Something known to be false” may be a result that con- 
flicts with—contradicts— itself, or some other hypothesis of the 
overall claim you are considering. For instance, a proof of “if ¢ 
then =” involves assuming that q is true and proving =f from 
it. If you prove =p by contradiction, that means assuming p in 
addition to q. If you can prove -g from f, you have shown that 
the assumption p leads to something that contradicts your other 
assumption g, since g and —g cannot both be true. Of course, 
you have to use other inference patterns in your proof of the con- 
tradiction, as well as unpacking definitions. Let’s consider an 
example. 


Proof. Suppose A C B and B = 0. We want to show that A has 
no elements. 


Since this is a conditional claim, we assume the an- 
tecedent and want to prove the consequent. The con- 
sequent is: A has no elements. We can make that a bit 
more explicit: it’s not the case that there is an x € A. 


A has no elements iff it’s not the case that there is an x such that 
xeéA. 


So we’ve determined that what we want to prove is 
really a negative claim =f, namely: it’s not the case 
that there is an x € A. To use proof by contradic- 
tion, we have to assume the corresponding positive 
claim f, i.e., there is an x € A, and prove a contra- 
diction from it. We indicate that we’re doing a proof 
by contradiction by writing “by way of contradiction, 
assume” or even just “suppose not,” and then state 
the assumption p. 
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Suppose not: there is an x € A. 


This is now the new assumption we’ll use to obtain a 
contradiction. We have two more assumptions: that 
AC B and that B = Q. The first gives us that x € B: 


Since AC B, xe B. 


But since B = 0, every element of B (e.g., x) must 
also be an element of 0. 


Since B = 0, x € Q. This is a contradiction, since by definition 0 
has no elements. 


This already completes the proof: we’ve arrived at 
what we need (a contradiction) from the assumptions 
we've set up, and this means that the assumptions 
can’t all be true. Since the first two assumptions (A C 
B and B = 9) are not contested, it must be the last 
assumption introduced (there is an x € A) that must 
be false. But if we want to be thorough, we can spell 
this out. 


Thus, our assumption that there is an x € A must be false, hence, 
A has no elements by proof by contradiction. oO 


Every positive claim is trivially equivalent to a negative claim: 
p iff ==. So proofs by contradiction can also be used to establish 
positive claims “indirectly,” as follows: To prove p, read it as the 
negative claim =f. If we can prove a contradiction from =f, 
we’ve established == by proof by contradiction, and hence /. 

In the last example, we aimed to prove a negative claim, 
namely that A has no elements, and so the assumption we made 
for the purpose of proof by contradiction (i.e., that there is an 
x € A) was a positive claim. It gave us something to work with, 
namely the hypothetical x € A about which we continued to rea- 
son until we got to x € 0. 
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When proving a positive claim indirectly, the assumption 
youd make for the purpose of proof by contradiction would be 
negative. But very often you can easily reformulate a positive 
claim as a negative claim, and a negative claim as a positive 
claim. Our previous proof would have been essentially the same 
had we proved “A = @” instead of the negative consequent “A 
has no elements.” (By definition of =, “A = 0” is a general claim, 
since it unpacks to “every element of A is an element of @ and 
vice versa”.) But it is easily seen to be equivalent to the negative 
claim “not: there is an x € A.” 

So it is sometimes easier to work with =p as an assumption 
than it is to prove p directly. Even when a direct proof is just as 
simple or even simpler (as in the next examples), some people 
prefer to proceed indirectly. If the double negation confuses you, 
think of a proof by contradiction of some claim as a proof of a 
contradiction from the opposite claim. So, a proof by contradic- 
tion of =f is a proof of a contradiction from the assumption p; and 
proof by contradiction of p is a proof of a contradiction from =. 


Proof. We want to show that A C AUB. 


On the face of it, this is a positive claim: every x € A 
is also in AU B. The negation of that is: some x € 
Ais ¢ AUB. So we can prove the claim indirectly 
by assuming this negated claim, and showing that it 
leads to a contradiction. 


Suppose not, ie., AZ AUB. 


We have a definition of A C AU B: every x € A is 
also € AU B. To understand what A ¢ AU B means, 
we have to use some elementary logical manipulation 
on the unpacked definition: it’s false that every x € A 
is also € AUB iff there is some x € A that is ¢ C. 
(This is a place where you want to be very careful: 
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many students’ attempted proofs by contradiction fail 
because they analyze the negation of a claim like “all 
As are Bs” incorrectly.) In other words, A ¢ AU B iff 
there is an x such that x € A and x ¢ AUB. From 
then on, it’s easy. 


So, there is an x € A such that x ¢ AU B. By definition of U, 
x € AUB iff x € Aor x € B. Since x € A, we have x € AUB. 
This contradicts the assumption that x ¢ AU B. Oo 


Proof. Suppose A C B and B C C. We want to show 4 C C. 


Let’s proceed indirectly: we assume the negation of 
what we want to etablish. 


Suppose not, ie., A ZC. 


As before, we reason that A ¢ C iff not every x ¢ A 
is also € C, ie, some x € Ais ¢ C. Don’t worry, 
with practice you won’t have to think hard anymore 
to unpack negations like this. 


In other words, there is an x such that x € A and x € C. 


Now we can use this to get to our contradiction. Of 
course, we'll have to use the other two assumptions 
to do it. 


Since A C B, x € B. Since B C C, x € C. But this contradicts 
x EC. Oo 
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Proof. Suppose AU B = AN B. We want to show that A = B. 
The beginning is now routine: 
Assume, by way of contradiction, that A # B. 


Our assumption for the proof by contradiction is that 
A # B. Since A = B iff A C B an B C A, we get that 
A+ Biff Ag Bor B ¢ A. (Note how important it is 
to be careful when manipulating negations!) To prove 
a contradiction from this disjunction, we use a proof 
by cases and show that in each case, a contradiction 
follows. 


A# Biff A¢ Bor BE A. We distinguish cases. 


In the first case, we assume A ¢ B, i.e., for some x, 
x € A but ¢ B. ANB is defined as those elements 
that A and B have in common, so if something isn’t 
in one of them, it’s not in the intersection. A U B is 
A together with B, so anything in either is also in the 
union. This tells us that x € AUB but « ¢ AN B, and 
hence that AN B# AUB. 


Case 1: A ¢ B. Then for some x, x € A but x ¢ B. Since 
x ¢ B, then x € ANB. Since x € A, x € AUB. So, ANB# AUB, 
contradicting the assumption that AN B= AUB. 

Case 2: B ¢ A. Then for some y, y € B but y ¢ A. As before, 
we have y € AUB but y ¢ AN B, and so AN B # AUB, again 
contradicting AN B= AUB. Oo 


A.8 Reading Proofs 


Proofs you find in textbooks and articles very seldom give all the 
details we have so far included in our examples. Authors often 
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do not draw attention to when they distinguish cases, when they 
give an indirect proof, or don’t mention that they use a definition. 
So when you read a proof in a textbook, you will often have to 
fill in those details for yourself in order to understand the proof. 
Doing this is also good practice to get the hang of the various 
moves you have to make in a proof. Let’s look at an example. 


Proof. If z € AN (AUB), then z € A, so AN (AUB) CA. 
Now suppose z € A. Then also z € AU B, and therefore also 
z€AN(AUB). Oo 


The preceding proof of the absorption law is very condensed. 
There is no mention of any definitions used, no “we have to prove 
that” before we prove it, etc. Let’s unpack it. The proposition 
proved is a general claim about any sets A and B, and when the 
proof mentions A or B, these are variables for arbitrary sets. The 
general claims the proof establishes is what’s required to prove 
identity of sets, ie., that every element of the left side of the 
identity is an element of the right and vice versa. 


“If z € AN (AUB), then z € A, so AN (AUB) C A.” 


This is the first half of the proof of the identity: it estabishes 
that if an arbitrary z is an element of the left side, it is also 
an element of the right, ie, AM (AUB) C A. Assume that 
z€AN(AUB). Since z is an element of the intersection of two 
sets iff it is an element of both sets, we can conclude that z € A 
and also z € AUB. In particular, z € A, which is what we wanted 
to show. Since that’s all that has to be done for the first half, we 
know that the rest of the proof must be a proof of the second half, 
ie., a proof that AC AN (AUB). 


“Now suppose z € A. Then also z ¢ AUB, and 
therefore also z € AN (AU B).” 
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We start by assuming that z € A, since we are showing that, 
for any z, if z € Athen z € AN(AUB). To show that z € AN(AUB), 
we have to show (by definition of “N”) that (i) z € A and also (ii) 
z € AUB. Here (i) is just our assumption, so there is nothing 
further to prove, and that’s why the proof does not mention it 
again. For (ii), recall that z is an element of a union of sets 
iff it is an element of at least one of those sets. Since z € A, 
and AU B is the union of A and B, this is the case here. So 
z € AUB. We’ve shown both (i) z € A and (ii) z €¢ AUB, hence, 
by definition of “n,” z ¢ AN (AUB). The proof doesn’t mention 
those definitions; it’s assumed the reader has already internalized 
them. If you haven’t, you’ll have to go back and remind yourself 
what they are. Then you'll also have to recognize why it follows 
from z € A that z € AUB, and from z € A and z € AUB that 
zE€AN(AUVUB). 

Here’s another version of the proof above, with everything 
made explicit: 


Proof. (By definition of = for sets, AN(AUB) = A we have to show 
(a) AN (AUB) C Aand (b) AN (AUB) C A. (a): By definition 
of C, we have to show that if z € AN (AUB), then z € A.] If 
z € AN(AUB), then z € A [since by definition of N, z € AN(AUB) 
iff z¢ Aand z ¢ AUB], so AN (AUB) CA. [(b): By definition 
of C, we have to show that if z € A, then z € AN (AU B).] Now 
suppose [(1)] z € A. Then also [(2)] z € AUB [since by (1) z € A 
or z € B, which by definition of U means z € AUB], and therefore 
also z € AN (AUB) [since the definition of N requires that z € A, 
ie., (1), and z€ AUB), ie., (2)]. Oo 


A.g ICan’'t Do It! 


We all get to a point where we feel like giving up. But you can do 
it. Your instructor and teaching assistant, as well as your fellow 
students, can help. Ask them for help! Here are a few tips to help 
you avoid a crisis, and what to do if you feel like giving up. 
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To make sure you can solve problems successfully, do the fol- 
lowing: 


1. Start as far in advance as possible. We get busy throughout 
the semester and many of us struggle with procrastination, 
one of the best things you can do is to start your homework 
assignments early. That way, if you’re stuck, you have time 
to look for a solution (that isn’t crying). 


2. Talk to your classmates. You are not alone. Others in the 
class may also struggle—but the may struggle with differ- 
ent things. Talking it out with your peers can give you 
a different perspective on the problem that might lead to 
a breakthrough. Of course, don’t just copy their solution: 
ask them for a hint, or explain where you get stuck and ask 
them for the next step. And when you do get it, recipro- 
cate. Helping someone else along, and explaining things 
will help you understand better, too. 


3. Ask for help. You have many resources available to you— 
your instructor and teaching assistant are there for you 
and want you to succeed. They should be able to help 
you work out a problem and identify where in the process 
you're struggling. 


4. Take a break. If you’re stuck, it might be because you’ve been 
staring at the problem for too long. Take a short break, 
have a cup of tea, or work on a different problem for a 
while, then return to the problem with a fresh mind. Sleep 
on it. 


Notice how these strategies require that you’ve started to work 
on the proof well in advance? If you’ve started the proof at 2am 
the day before it’s due, these might not be so helpful. 

This might sound like doom and gloom, but solving a proof 
is a challenge that pays off in the end. Some people do this as 
a career—so there must be something to enjoy about it. Like 
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basically everything, solving problems and doing proofs is some- 
thing that requires practice. You might see classmates who find 
this easy: they’ve probably just had lots of practice already. Try 
not to give in too easily. 

If you do run out of time (or patience) on a particular prob- 
lem: that’s ok. It doesn’t mean you're stupid or that you will never 
get it. Find out (from your instructor or another student) how it 
is done, and identify where you went wrong or got stuck, so you 
can avoid doing that the next time you encounter a similar issue. 
Then try to do it without looking at the solution. And next time, 
start (and ask for help) earlier. 


A.10 Other Resources 


There are many books on how to do proofs in mathematics which 
may be useful. Check out How to Read and do Proofs: An Intro- 
duction to Mathematical Thought Processes (Solow, 2013) and How 
to Prove It: A Structured Approach (Velleman, 2019) in particular. 
The Book of Proof (Hammack, 2013) and Mathematical Reasoning 
(Sandstrum, 2019) are books on proof that are freely available 
online. Philosophers might find More Precisely: The Math you need 
to do Philosophy (Steinhart, 2018) to be a good primer on mathe- 
matical reasoning. 

There are also various shorter guides to proofs available on 
the internet; e.g., “Introduction to Mathematical Arguments” 
(Hutchings, 2003) and “How to write proofs” (Cheng, 2004). 


Motivational Videos 


Feel like you have no motivation to do your homework? Feeling 
down? These videos might help! 


* https: //www.youtube.com/watch?v=ZXsQAXx_ao0 
¢ https://www. youtube. com/watch?v=BQ4yd2W50No 


¢ https://www. youtube. com/watch?v=StTqxEQ21-Y 
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Problems 


Problem A.1. Suppose you are asked to prove that AN B # 0. 
Unpack all the definitions occuring here, i.e., restate this in a way 
that does not mention “N”, “=”, or “Q”. 


Problem A.2. Prove indirectly that AN BC A. 


Problem A.3. Expand the following proof of AU (AN B) = A, 
where you mention all the inference patterns used, why each step 
follows from assumptions or claims established before it, and 
where we have to appeal to which definitions. 


Proof. If z€ AU(AN B) then z € Aorze ANB. If ze ANB, 
z€A. Any z € Aisalso€e AU(ANB). Oo 
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Induction 


B.1 Introduction 


Induction is an important proof technique which is used, in dif- 
ferent forms, in almost all areas of logic, theoretical computer 
science, and mathematics. It is needed to prove many of the re- 
sults in logic. 

Induction is often contrasted with deduction, and character- 
ized as the inference from the particular to the general. For in- 
stance, if we observe many green emeralds, and nothing that we 
would call an emerald that’s not green, we might conclude that 
all emeralds are green. This is an inductive inference, in that it 
proceeds from many particlar cases (this emerald is green, that 
emerald is green, etc.) to a general claim (all emeralds are green). 
Mathematical induction is also an inference that concludes a gen- 
eral claim, but it is of a very different kind than this “simple 
induction.” 

Very roughly, an inductive proof in mathematics concludes 
that all mathematical objects of a certain sort have a certain prop- 
erty. In the simplest case, the mathematical objects an inductive 
proof is concerned with are natural numbers. In that case an in- 
ductive proof is used to establish that all natural numbers have 
some property, and it does this by showing that 


1. 0 has the property, and 
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2. whenever a number k has the property, so does k + 1. 


Induction on natural numbers can then also often be used to 
prove general claims about mathematical objects that can be as- 
signed numbers. For instance, finite sets each have a finite num- 
ber n of elements, and if we can use induction to show that every 
number 2 has the property “all finite sets of size n are...” then 
we will have shown something about all finite sets. 

Induction can also be generalized to mathematical objects 
that are inductively defined. For instance, expressions of a formal 
language such as those of first-order logic are defined inductively. 
Structural induction is a way to prove results about all such expres- 
sions. Structural induction, in particular, is very useful—and 
widely used—in logic. 


B.2 Induction on N 


In its simplest form, induction is a technique used to prove results 
for all natural numbers. It uses the fact that by starting from 0 
and repeatedly adding 1 we eventually reach every natural num- 
ber. So to prove that something is true for every number, we can 
(1) establish that it is true for 0 and (2) show that whenever it is 
true for a number 2, it is also true for the next number n+1. If we 
abbreviate “number n has property P” by P(n) (and “number k 
has property P” by P(k), etc.), then a proof by induction that 
P(n) for all n € N consists of: 


1. a proof of P(0), and 
2. a proof that, for any k, if P(k) then P(k +1). 


To make this crystal clear, suppose we have both (1) and (2). 
Then (1) tells us that P(0) is true. If we also have (2), we know 
in particular that if P(0) then P(0 +1), ie., P(1). This follows 
from the general statement “for any k, if P(k) then P(k +1)” by 
putting 0 for k. So by modus ponens, we have that P(1). From (2) 
again, now taking 1 for n, we have: if P(1) then P(2). Since we’ve 
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just established P(1), by modus ponens, we have P(2). And so 
on. For any number 2, after doing this n times, we eventually 
arrive at P(n). So (1) and (2) together establish P(n) for any 
neN. 

Let’s look at an example. Suppose we want to find out how 
many different sums we can throw with n dice. Although it might 
seem silly, let’s start with 0 dice. If you have no dice there’s only 
one possible sum you can “throw”: no dots at all, which sums 
to 0. So the number of different possible throws is 1. If you have 
only one die, i.e., 2 = 1, there are six possible values, 1 through 6. 
With two dice, we can throw any sum from 2 through 12, that’s 11 
possibilities. With three dice, we can throw any number from 3 to 
18, i.e., 16 different possibilities. 1, 6, 11, 16: looks like a pattern: 
maybe the answer is 5n + 1? Of course, 5n +1 is the maximum 
possible, because there are only 5n + 1 numbers between 2, the 
lowest value you can throw with n dice (all 1’s) and 6n, the highest 
you can throw (all 6’s). 


Proof. Let P(n) be the claim: “It is possible to throw any number 
between n and 6n using n dice.” To use induction, we prove: 


1. The induction basis P(1), i.e., with just one die, you can 
throw any number between 1 and 6. 


2. The induction step, for all k, if P(k) then P(k +1). 


(1) Is proved by inspecting a 6-sided die. It has all 6 sides, 
and every number between 1 and 6 shows up one on of the sides. 
So it is possible to throw any number between 1 and 6 using a 
single die. 

To prove (2), we assume the antecedent of the conditional, 
i.e., P(k). This assumption is called the inductive hypothesis. We 
use it to prove P(k+1). The hard part is to find a way of thinking 
about the possible values of a throw of k + 1 dice in terms of the 
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possible values of throws of k dice plus of throws of the extra 
k +1-st die—this is what we have to do, though, if we want to use 
the inductive hypothesis. 

The inductive hypothesis says we can get any number between 
k and 6k using k dice. If we throw a 1 with our (k +1)-st die, this 
adds 1 to the total. So we can throw any value between k +1 and 
6k + 1 by throwing k dice and then rolling a 1 with the (k + 1)-st 
die. What’s left? The values 6k + 2 through 6k + 6. We can get 
these by rolling & 6s and then a number between 2 and 6 with 
our (k + 1)-st die. Together, this means that with k + 1 dice we 
can throw any of the numbers between k + 1 and 6(k +1), ie., 
weve proved P(k +1) using the assumption P(k), the inductive 
hypothesis. oO 


Very often we use induction when we want to prove something 
about a series of objects (numbers, sets, etc.) that is itself defined 
“inductively,” i.e., by defining the (n+1)-st object in terms of the n- 
th. For instance, we can define the sum s, of the natural numbers 
up to n by 


$9 = 0 


Sn41 = Sn + (+1) 


This definition gives: 


So = 0, 

Sp =Sot1 =I, 

$2 = 5, +2 =14+2=3 

$3 = 59 +3 =1+2+4+3=6, etc. 


Now we can prove, by induction, that s, = n(n+1)/2. 


Proof. We have to prove (1) that so = 0-(0+1)/2 and (2) if 
Sp = k(k +1)/2 then spy, = (kK +1)(k + 2)/2. (4) is obvious. To 
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prove (2), we assume the inductive hypothesis: 5, = k(k + 1)/2. 
Using it, we have to show that sz41 = (K + 1)(k + 2)/2. 

What is 5441? By the definition, 54.) = 5, + (k +1). By in- 
ductive hypothesis, s; = k(k + 1)/2. We can substitute this into 
the previous equation, and then just need a bit of arithmetic of 
fractions: 


sea = “ETD 4 41) = 
_k(K+1) 2(k+1) | 
= 5 + 9 — 
ARF) +2441) _ 
2 a ne 
_ (kh 2)(h sd) 
—— 


oO 


The important lesson here is that if you’re proving something 
about some inductively defined sequence a,, induction is the ob- 
vious way to go. And even if it isn’t (as in the case of the possibil- 
ities of dice throws), you can use induction if you can somehow 
relate the case for k + 1 to the case for k. 


B.3 Strong Induction 


In the principle of induction discussed above, we prove P(0) and 
also if P(k), then P(k +1). In the second part, we assume that 
P(k) is true and use this assumption to prove P(k + 1). Equiva- 
lently, of course, we could assume P(k — 1) and use it to prove 
P(k)—the important part is that we be able to carry out the in- 
ference from any number to its successor; that we can prove the 
claim in question for any number under the assumption it holds 
for its predecessor. 

There is a variant of the principle of induction in which we 
don’t just assume that the claim holds for the predecessor k — 1 
of k, but for all numbers smaller than k, and use this assump- 
tion to establish the claim for k. This also gives us the claim 
P(n) for all n € N. For once we have established P(0), we have 
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thereby established that P holds for all numbers less than 1. And 
if we know that if P(/) for all J < k, then P(k), we know this 
in particular for k = 1. So we can conclude P(1). With this we 
have proved P(0) and P(1), ie., P(/) for all / < 2, and since we 
have also the conditional, if P(/) for all / < 2, then P(2), we can 
conclude P(2), and so on. 

In fact, if we can establish the general conditional “for all k, 
if P(Z) for all 1 < k, then P(k),” we do not have to establish P(0) 
anymore, since it follows from it. For remember that a general 
claim like “for all / < k, P(/)” is true if there are no I < k. This 
is a case of vacuous quantification: “all As are Bs” is true if there 
are no As, Vx (A(x) > B(x)) is true if no x satisfies A(x). In this 
case, the formalized version would be “VI (1 < k — P(l))”—and 
that is true if there are no / < k. And if k = 0 that’s exactly the 
case: no J < 0, hence “for all J < 0, P(0)” is true, whatever P is. 
A proof of “if P(/) for all / < k, then P(x)” thus automatically 
establishes P(0). 

This variant is useful if establishing the claim for k can’t be 
made to just rely on the claim for k — 1 but may require the 
assumption that it is true for one or more / < k. 


B.4 Inductive Definitions 


In logic we very often define kinds of objects inductively, i.e., by 
specifying rules for what counts as an object of the kind to be 
defined which explain how to get new objects of that kind from 
old objects of that kind. For instance, we often define special 
kinds of sequences of symbols, such as the terms and formulas of 
a language, by induction. For a simple example, consider strings 
of consisting of letters a, b, c, d, the syeuye o, and brackets | and 
], such as “[[cod][”, “[a[]o]”, “a” or “[[acb] od]”. You probably 
feel that there’s something “wrong” with the first two strings: the 
brackets don’t “balance” at all in the first, and you might feel that 
the “o” should “connect” expressions that themselves make sense. 
The third and fourth string look better: for every “[” there’s a 
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closing “|” (if there are any at all), and for any o we can find “nice” 
expressions on either side, surrounded by a pair of parentheses. 

We would like to precisely specify what counts as a “nice 
term.” First of all, every letter by itself is nice. Anything that’s 
not just a letter by itself should be of the form “[t¢ o s]” where s 
and ¢ are themselves nice. Conversely, if ¢ and s are nice, then we 
can form a new nice term by putting a o between them and sur- 
round them by a pair of brackets. We might use these operations 
to define the set of nice terms. This is an inductive definition. 


Definition B.3 (Nice terms). The set of nice terms is inductively 
defined as follows: 


1. Any letter a, b, c, d is a nice term. 


2. If s; and sg are nice terms, then so is [5] © 59]. 


3. Nothing else is a nice term. 


This definition tells us that something counts as a nice term iff 
it can be constructed according to the two conditions (1) and (2) 
in some finite number of steps. In the first step, we construct all 
nice terms just consisting of letters by themselves, i.e., 


a,b,c,d 


In the second step, we apply (2) to the terms we’ve constructed. 
We'll get 
[aca],[aob],[boal],...,[dod| 


for all combinations of two letters. In the third step, we apply 
(2) again, to any two nice terms we’ve constructed so far. We get 
new nice term such as [ao [aoa] ]—where ¢ is a from step 1 and s 
is [aoa] from step 2—and [[boc] o[dob]] constructed out of the 
two terms [boc] and [dob] from step 2. And so on. Clause (3) 
rules out that anything not constructed in this way sneaks into 
the set of nice terms. 
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Note that we have not yet proved that every sequence of sym- 
bols that “feels” nice is nice according to this definition. However, 
it should be clear that everything we can construct does in fact 
“feel nice”: brackets are balanced, and o connects parts that are 
themselves nice. 

The key feature of inductive definitions is that if you want 
to prove something about all nice terms, the definition tells you 
which cases you must consider. For instance, if you are told that 
t is a nice term, the inductive definition tells you what ¢ can look 
like: ¢ can be a letter, or it can be [51 ° s9] for some pair of 
nice terms s; and sy. Because of clause (3), those are the only 
possibilities. 

When proving claims about all of an inductively defined set, 
the strong form of induction becomes particularly important. For 
instance, suppose we want to prove that for every nice term of 
length n, the number of [ in it is < n/2. This can be seen as a 
claim about all n: for every n, the number of [ in any nice term 
of length n is < n/2. 


Proof. To prove this result by (strong) induction, we have to show 
that the following conditional claim is true: 


If for every / < k, any nice term of length / has < 1/2 
[’s, then any nice term of length k& has < k/2 [’s. 


To show this conditional, assume that its antecedent is true, i.e., 
assume that for any / < k, nice terms of length / contain < //2 
[’s. We call this assumption the inductive hypothesis. We want 
to show the same is true for nice terms of length k. 

So suppose ¢ is a nice term of length k. Because nice terms 
are inductively defined, we have two cases: (1) ¢ is a letter by 
itself, or (2) ¢ is [51 o s9] for some nice terms s, and 59. 


1. ¢ is a letter. Then & = 1, and the number of [ in ¢ is 0. 
Since 0 < 1/2, the claim holds. 
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2. tis [51°59] for some nice terms sq and sg. Let’s let 4) be the 
length of s; and Jy be the length of so. Then the length k of 
tis 4, +)+3 (the lengths of s; and sy plus three symbols 
[, °, ]). Since 4 + l9 + 3 is always greater than 4, 4 < k. 
Similarly, /) < k. That means that the induction hypothesis 
applies to the terms 5; and sg: the number my of [ in s; is 
< 1/2, and the number mg of [ in s9 is < fp/2. 


The number of [ in ¢ is the number of [ in 1, plus the 
number of [ in s9, plus 1, ie., it is mj + mg +1. Since 
m, < 4/2 and mg < 9/2 we have: 

L Ip _h+h+2 Lt+h+3 


o9+1 l= 
my + Mm) + a as 5) 5) 


= k/2. 


In each case, we’ve shown that the number of [ in ¢ is < k/2 (on 
the basis of the inductive hypothesis). By strong induction, the 
proposition follows. Oo 


B.5 Structural Induction 


So far we have used induction to establish results about all natural 
numbers. But a corresponding principle can be used directly to 
prove results about all elements of an inductively defined set. 
This often called structural induction, because it depends on the 
structure of the inductively defined objects. 

Generally, an inductive definition is given by (a) a list of “ini- 
tial” elements of the set and (b) a list of operations which produce 
new elements of the set from old ones. In the case of nice terms, 
for instance, the initial objects are the letters. We only have one 
operation: the operations are 


0(51,52) =[s1 © 52] 


You can even think of the natural numbers N themselves as being 
given by an inductive definition: the initial object is 0, and the 
operation is the successor function x + 1. 
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In order to prove something about all elements of an induc- 
tively defined set, i.e., that every element of the set has a prop- 
erty P, we must: 


1. Prove that the initial objects have P 


2. Prove that for each operation 0, if the arguments have P, 
so does the result. 


For instance, in order to prove something about all nice terms, 
we would prove that it is true about all letters, and that it is true 
about [51 © 52] provided it is true of s; and sg individually. 


Proof. We use structural induction. Nice terms are inductively 
defined, with letters as initial objects and the operation 0 for con- 
structing new nice terms out of old ones. 


1. The claim is true for every letter, since the number of [ in 
a letter by itself is 0 and the number of | in it is also 0. 


2. Suppose the number of [ in s; equals the number of |, and 
the same is true for so. The number of [ in 0(51,59), i.e., 
in [5s] © 59], is the sum of the number of [ in 5; and so plus 
one. The number of | in 0(s1,52) is the sum of the number 
of | in 5; and sg plus one. Thus, the number of [ in 0(51, 52) 
equals the number of | in 0(51, 59). Oo 


Let’s give another proof by structural induction: a proper 
initial segment of a string ¢ of symbols is any string s that agrees 
with ¢ symbol by symbol, read from the left, but ¢ is longer. So, 
e.g., [ao is a proper initial segment of [a o b], but neither are 
[50 (they disagree at the second symbol) nor [ao 5] (they are 
the same length). 


APPENDIX B. INDUCTION 373 


Proof: By induction on ¢: 


1. ¢ is a letter by itself: Then ¢ has no proper initial segments. 


2. t = [51 © sg] for some nice terms s, and sy. If r is a proper 
initial segment of ¢, there are a number of possibilities: 


a) 


b) 


d) 


r is just [: Then r has one more [ than it does ]. 


r is [r, where 7, is a proper initial segment of s;: Since 
5, is a nice term, by induction hypothesis, 7; has more 
[ than ] and the same is true for [7}. 


r is [s, or [5] 0: By the previous result, the number 
of [ and ] in s; are equal; so the number of [ in [s) or 
[s; © is one more than the number of ]. 


r is [5s] © r2 where 79 is a proper initial segment of 59: 
By induction hypothesis, 72 contains more [ than ]. By 
the previous result, the number of [ and of | in s, are 
equal. So the number of [ in [s, ° 7 is greater than 
the number of |. 


r is [s] © sg: By the previous result, the number of [ 
and | in s; are equal, and the same for s9. So there is 
one more [ in [s, © sp than there are |. Oo 


B.6 Relations and Functions 


When we have defined a set of objects (such as the natural num- 
bers or the nice terms) inductively, we can also define relations on 
these objects by induction. For instance, consider the following 
idea: a nice term ¢, is a subterm of a nice term fy if it occurs as 
a part of it. Let’s use a symbol for it: 4 £ %&. Every nice term 
is a subterm of itself, of course: ¢ EC ¢. We can give an inductive 
definition of this relation as follows: 
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Definition B.7. The relation of a nice term ¢; being a subterm 
of tg, 4 E tg, is defined by induction on ¢ as follows: 


1. If f& is a letter, then f C f iff h = fo. 


2. If to is [5] 0 so], then 4 C fo iff 4 = fo, fy C 51, or fy CE 50. 


This definition, for instance, will tell us that a C [boa]. For 
(2) says that a EC [boa] iffa = [boa], oral db, oraLa. The 
first two are false: a clearly isn’t identical to [b 0 a], and by (1), 
a E b iff a = b, which is also false. However, also by (1), aC a iff 
a =a, which is true. 

It’s important to note that the success of this definition de- 
pends on a fact that we haven’t proved yet: every nice term ¢ is 
either a letter by itself, or there are uniquely determined nice terms 
s; and sg such that ¢ = [5s] o sg]. “Uniquely determined” here 
means that if ¢ = [51 © sg] it isn’t also = [r, © rg] with 5) # 74 or 
59 # 19. If this were the case, then clause (2) may come in conflict 
with itself: reading fg as [51 o 59] we might get 4 EC éo, but if we 
read fj as [7] 0 r2] we might get not 4 E fo. Before we prove that 
this can’t happen, let’s look at an example where it can happen. 


Definition B.8. Define bracketless terms inductively by 
1. Every letter is a bracketless term. 


2. If s; and sg are bracketless terms, then 51 0 59 is a bracketless 
term. 


3. Nothing else is a bracketless term. 


Bracketless terms are, e.g., a, bod, boaob. Now if we defined 
“subterm” for bracketless terms the way we did above, the second 
clause would read 


If #2 = 5] © Sg, then 4 C fo iff h = ft, 4 C51, or EC 59. 
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Now bo ao bis of the form sj © sy with 
5; = band Sg =aob. 
It is also of the form 7 0 ro with 


1 =boaand ro =b. 


Now is acb a subterm of boaob? The answer is yes if we go by 
the first reading, and no if we go by the second. 

The property that the way a nice term is built up from other 
nice terms is unique is called unique readability. Since inductive 
definitions of relations for such inductively defined objects are 
important, we have to prove that it holds. 


Proof. If ¢ is a letter by itself, the condition is satisfied. So assume 
t isn’t a letter by itself. We can tell from the inductive definition 
that then ¢ must be of the form [5s] © sg] for some nice terms 5, 
and s9. It remains to show that these are uniquely determined, 
ie., if ¢ = [7] 0 79], then 5, = ry and so = 79. 

So suppose ¢ = [5] 0 sg] and also ¢ = [1 0 19] for nice terms 51, 
§2, 11, ’9. We have to show that s, = 7; and so = ro. First, 5; and 7, 
must be identical, for otherwise one is a proper initial segment of 
the other. But by Proposition B.6, that is impossible if 5) and 
are both nice terms. But if 5} = 1, then clearly also so = 19. oO 


We can also define functions inductively: e.g., we can define 
the function f that maps any nice term to the maximum depth 
of nested [...] in it as follows: 
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Definition B.10. The depth of a nice term, f(¢), is defined in- 
ductively as follows: 


0 if ¢ is a letter 
f(o) = 
max(f(s1),f(52))+1 if ¢ = [s, 0 59]. 


For instance 
f([aob]) = max(f(a), f(b)) +1 = 
= max(0,0) +1=1, and 
f([[aob] oc]) = max(f([aob]),f(c))+1= 
= max(1,0)+1=2. 


Here, of course, we assume that s] an sy are nice terms, and 
make use of the fact that every nice term is either a letter or of 
the form [5] ° s9]. It is again important that it can be of this form 
in only one way. To see why, consider again the bracketless terms 
we defined earlier. The corresponding “definition” would be: 


(1) = 0 if ¢ is a letter 
oe |max(g(s1).g(s2)) +1 if £ = 51 0 59, 


Now consider the bracketless term ao bocod. It can be read in 
more than one way, e.g., as 51 © sg with 


§, =a and 59 =bocod, 
or as 7] © ro with 
m1 =aoband rm = cod. 
Calculating g according to the first way of reading it would give 


&(51 © Sg) = max(g(a),g(bocod))+1= 
= max(0,2)+1=3 


APPENDIX B. INDUCTION 377 
while according to the other reading we get 
g(m ° %) = max(g(aob),g(cod))+1= 
= max(1,1)+1=2 


But a function must always yield a unique value; so our “defini- 
tion” of g doesn’t define a function at all. 


Problems 
Problem B.1. Define the set of supernice terms by 
1. Any letter a, b, c, d is a supernice term. 
2. If s is a supernice term, then so is [s]. 
3. If sy and sg are supernice terms, then so is [5] © 59]. 
4. Nothing else is a supernice term. 


Show that the number of [ in a supernice term ¢ of length n is 
< n/2+1. 


Problem B.2. Prove by structural induction that no nice term 
starts with ]. 


Problem B.3. Give an inductive definition of the function J, 
where /(¢) is the number of symbols in the nice term f. 


Problem B.4. Prove by structural induction on nice terms ¢ that 
f(t) < L(t) (where /(¢) is the number of symbols in ¢ and f(¢) is 
the depth of ¢ as defined in Definition B.10). 
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Biographies 


C.1 Georg Cantor 


An early biography of Georg 
Cantor (GAyY-org KAHN-tor) 
claimed that he was born and 
found on a ship that was sail- 
ing for Saint Petersburg, Rus- 
sia, and that his parents were 
unknown. This, however, is 
not true; although he was 
born in Saint Petersburg in 
1845. 

Cantor received his doc- 
torate in mathematics at the 
University of Berlin in 1867. 
He is known for his work in 
set theory, and is credited 
with founding set theory as a 
distinctive research discipline. 
He was the first to prove that 
there are infinite sets of different sizes. His theories, and espe- 
cially his theory of infinities, caused much debate among mathe- 
maticians at the time, and his work was controversial. 


Fig. C.1: Georg Cantor 
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Cantor’s religious beliefs and his mathematical work were in- 
extricably tied; he even claimed that the theory of transfinite num- 
bers had been communicated to him directly by God. In later 
life, Cantor suffered from mental illness. Beginning in 1894, and 
more frequently towards his later years, Cantor was hospitalized. 
The heavy criticism of his work, including a falling out with the 
mathematician Leopold Kronecker, led to depression and a lack 
of interest in mathematics. During depressive episodes, Cantor 
would turn to philosophy and literature, and even published a 
theory that Francis Bacon was the author of Shakespeare’s plays. 

Cantor died on January 6, 1918, in a sanatorium in Halle. 


Further Reading For full biographies of Cantor, see Dauben 
(1990) and Grattan-Guinness (1971). Cantor’s radical views are 
also described in the BBC Radio 4 program A Brief History of 
Mathematics (du Sautoy, 2014). If you'd like to hear about Can- 
tor’s theories in rap form, see Rose (2012). 


C.2 Alonzo Church 


Alonzo Church was born in 
Washington, DC on June 14, 
1903. In early childhood, an 
air gun incident left Church 
blind in one eye. He fin- 
ished preparatory school in 
Connecticut in 1920 and be- 
gan his university education 
at Princeton that same year. 
He completed his doctoral 
studies in 1927. After a cou- 
ple years abroad, Church re- 
turned to Princeton. Church 
was known exceedingly polite Fig. C.2: Alonzo Church 
and careful. His blackboard 
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writing was immaculate, and he would preserve important pa- 
pers by carefully covering them in Duco cement (a clear glue). 
Outside of his academic pursuits, he enjoyed reading science fic- 
tion magazines and was not afraid to write to the editors if he 
spotted any inaccuracies in the writing. 

Church’s academic achievements were great. Together with 
his students Stephen Kleene and Barkley Rosser, he developed 
a theory of effective calculability, the lambda calculus, indepen- 
dently of Alan Turing’s development of the Turing machine. The 
two definitions of computability are equivalent, and give rise to 
what is now known as the Church-Turing Thesis, that a function 
of the natural numbers is effectively computable if and only if 
it is computable via Turing machine (or lambda calculus). He 
also proved what is now known as Church’s Theorem: The deci- 
sion problem for the validity of first-order formulas is unsolvable. 

Church continued his work into old age. In 1967 he left 
Princeton for UCLA, where he was professor until his retirement 
in 1990. Church passed away on August 1, 1995 at the age of 92. 


Further Reading For a brief biography of Church, see En- 
derton (2019). Church’s original writings on the lambda calcu- 
lus and the Entscheidungsproblem (Church’s Thesis) are Church 
(1936a,b). Aspray (1984) records an interview with Church about 
the Princeton mathematics community in the 1930s. Church 
wrote a series of book reviews of the Journal of Symbolic Logic from 
1936 until 1979. They are all archived on John MacFarlane’s web- 
site (MacFarlane, 2015). 


C.3 Gerhard Gentzen 


Gerhard Gentzen is known primarily as the creator of structural 
proof theory, and specifically the creation of the natural deduc- 
tion and sequent calculus derivation systems. He was born on 
November 24, 1909 in Greifswald, Germany. Gerhard was home- 
schooled for three years before attending preparatory school, 
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where he was behind most of his classmates in terms of educa- 
tion. Despite this, he was a brilliant student and showed a strong 
aptitude for mathematics. His interests were varied, and he, for 
instance, also write poems for his mother and plays for the school 
theatre. 

Gentzen began his uni- 
versity studies at the Univer- 
sity of Greifswald, but moved 
around to Géttingen, Munich, 
and Berlin. He received his 
doctorate in 1933 from the 
University of Géttingen un- 
der Hermann Weyl. (Paul 
Bernays supervised most of 
his work, but was dismissed fig. C3: Gerhard Gentzen 
from the university by the 
Nazis.) In 1934, Gentzen began work as an assistant to David 
Hilbert. That same year he developed the sequent calculus and 
natural deduction derivation systems, in his papers Untersuchun- 
gen tiber das logische SchlieSen I-III [Investigations Into Logical De- 
duction I-II]. He proved the consistency of the Peano axioms in 
1936. 

Gentzen’s relationship with the Nazis is complicated. At the 


same time his mentor Bernays was forced to leave Germany, 
Gentzen joined the university branch of the SA, the Nazi paramil- 
itary organization. Like many Germans, he was a member of 
the Nazi party. During the war, he served as a telecommunica- 
tions officer for the air intelligence unit. However, in 1942 he was 
released from duty due to a nervous breakdown. It is unclear 
whether or not Gentzen’s loyalties lay with the Nazi party, or 
whether he joined the party in order to ensure academic success. 

In 1943, Gentzen was offered an academic position at the 
Mathematical Institute of the German University of Prague, 
which he accepted. However, in 1945 the citizens of Prague re- 
volted against German occupation. Soviet forces arrived in the 
city and arrested all the professors at the university. Because of 
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his membership in Nazi organizations, Gentzen was taken to a 
forced labour camp. He died of malnutrition while in his cell on 
August 4, 1945 at the age of 35. 


Further Reading For a full biography of Gentzen, see Menzler- 
Trott (2007). An interesting read about mathematicians under 
Nazi rule, which gives a brief note about Gentzen’s life, is given by 
Segal (2014). Gentzen’s papers on logical deduction are available 
in the original german (Gentzen, 1935a,b). English translations 
of Gentzen’s papers have been collected in a single volume by 
Szabo (1969), which also includes a biographical sketch. 


C.4 Kurt Gédel 


Kurt Gédel (GER-dle) was 
born on April 28, 1906 
in Briinn in the Austro- 
Hungarian empire (now Brno 
in the Czech Republic). Due 
to his inquisitive and bright 
nature, young Kurtele was 
often called “Der kleine Herr 
Warum” (Little Mr. Why) 
by his family. He excelled 
in academics from primary 
school onward, where he got 
less than the highest grade 
only in mathematics. Gédel 
was often absent from school 
due to poor health and was 
exempt from physical educa- Fig. C.4: Kurt Gédel 

tion. He was diagnosed with 

rheumatic fever during his childhood. Throughout his life, he 
believed this permanently affected his heart despite medical 
assessment saying otherwise. 
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Gédel began studying at the University of Vienna in 1924 
and completed his doctoral studies in 1929. He first intended to 
study physics, but his interests soon moved to mathematics and 
especially logic, in part due to the influence of the philosopher 
Rudolf Carnap. His dissertation, written under the supervision 
of Hans Hahn, proved the completeness theorem of first-order 
predicate logic with identity (Gédel, 1929). Only a year later, he 
obtained his most famous results—the first and second incom- 
pleteness theorems (published in Gédel 1931). During his time 
in Vienna, Gédel was heavily involved with the Vienna Circle, 
a group of scientifically-minded philosophers that included Car- 
nap, whose work was especially influenced by Gédel’s results. 

In 1938, Gédel married Adele Nimbursky. His parents were 
not pleased: not only was she six years older than him and al- 
ready divorced, but she worked as a dancer in a nightclub. Social 
pressures did not affect Gédel, however, and they remained hap- 
pily married until his death. 

After Nazi Germany annexed Austria in 1938, Gédel and 
Adele emigrated to the United States, where he took up a po- 
sition at the Institute for Advanced Study in Princeton, New Jer- 
sey. Despite his introversion and eccentric nature, Gédel’s time 
at Princeton was collaborative and fruitful. He published essays 
in set theory, philosophy and physics. Notably, he struck up a par- 
ticularly strong friendship with his colleague at the IAS, Albert 
Einstein. 

In his later years, Gédel’s mental health deteriorated. His 
wife’s hospitalization in 1977 meant she was no longer able to 
cook his meals for him. Having suffered from mental health issues 
throughout his life, he succumbed to paranoia. Deathly afraid of 
being poisoned, Gédel refused to eat. He died of starvation on 
January 14, 1978, in Princeton. 


Further Reading For a complete biography of Gédel’s life is 
available, see John Dawson (1997). For further biographical 
pieces, as well as essays about Gédel’s contributions to logic and 
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philosophy, see Wang (1990), Baaz et al. (2011), Takeuti et al. 
(2003), and Sigmund et al. (2007). 

Gédel’s PhD thesis is available in the original German (Gédel, 
1929). The original text of the incompleteness theorems is 
(Gédel, 1931). All of Gédel’s published and unpublished writ- 
ings, as well as a selection of correspondence, are available in 
English in his Collected Papers Feferman et al. (1986, 1990). 

For a detailed treatment of Gédel’s incompleteness theorems, 
see Smith (2013). For an informal, philosophical discussion 
of Gédel’s theorems, see Mark Linsenmayer’s podcast (Linsen- 
mayer, 2014). 


C.5 Emmy Noether 


Emmy Noether (NER-ter) was 
born in Erlangen, Germany, 
on March 23, 1882, to an 
upper-middle class scholarly 
family. Hailed as the “mother 
of modern algebra,” Noether 
made groundbreaking contri- 
butions to both mathemat- 
ics and physics, despite sig- 
nificant barriers to women’s 
education. In Germany at 
the time, young girls were 
meant to be educated in 
arts and were not allowed 
to attend college preparatory 
schools. However, after au- 
diting classes at the Universi- Fig. C.5: Emmy Noether 

ties of Géttingen and Erlan- 

gen (where her father was professor of mathematics), Noether 
was eventually able to enroll as a student at Erlangen in 1904, 
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when their policy was updated to allow female students. She re- 
ceived her doctorate in mathematics in 1907. 

Despite her qualifications, Noether experienced much resis- 
tance during her career. From 1908-1915, she taught at Erlangen 
without pay. During this time, she caught the attention of David 
Hilbert, one of the world’s foremost mathematicians of the time, 
who invited her to Géttingen. However, women were prohibited 
from obtaining professorships, and she was only able to lecture 
under Hilbert’s name, again without pay. During this time she 
proved what is now known as Noether’s theorem, which is still 
used in theoretical physics today. Noether was finally granted 
the right to teach in 1919. Hilbert’s response to continued resis- 
tance of his university colleagues reportedly was: “Gentlemen, 
the faculty senate is not a bathhouse.” 

In the later 1920s, she concentrated on work in abstract alge- 
bra, and her contributions revolutionized the field. In her proofs 
she often made use of the so-called ascending chain condition, 
which states that there is no infinite strictly increasing chain of 
certain sets. For instance, certain algebraic structures now known 
as Noetherian rings have the property that there are no infinite 
sequences of ideals i ¢ Jj ¢ .... The condition can be general- 
ized to any partial order (in algebra, it concerns the special case 
of ideals ordered by the subset relation), and we can also con- 
sider the dual descending chain condition, where every strictly 
decreasing sequence in a partial order eventually ends. If a par- 
tial order satisfies the descending chain condition, it is possible 
to use induction along this order in a similar way in which we 
can use induction along the < order on N. Such orders are called 
well-founded or Noetherian, and the corresponding proof principle 
Noetherian induction. 

Noether was Jewish, and when the Nazis came to power in 
1933, she was dismissed from her position. Luckily, Noether was 
able to emigrate to the United States for a temporary position at 
Bryn Mawr, Pennsylvania. During her time there she also lectured 
at Princeton, although she found the university to be unwelcom- 
ing to women (Dick, 1981, 81). In 1935, Noether underwent an 
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operation to remove a uterine tumour. She died from an infection 
as a result of the surgery, and was buried at Bryn Mawr. 


Further Reading For a biography of Noether, see Dick (1981). 
The Perimeter Institute for Theoretical Physics has their lectures 
on Noether’s life and influence available online (Institute, 2015). 
If yow’re tired of reading, Stuff You Missed in History Class has a 
podcast on Noether’s life and influence (Frey and Wilson, 2015). 
The collected works of Noether are available in the original Ger- 
man (Jacobson, 1983). 


C.6 Bertrand Russell 


Bertrand Russell is hailed as 
one of the founders of mod- 
ern analytic philosophy. Born 
May 18, 1872, Russell was 
not only known for his work 
in philosophy and logic, but 
wrote many popular books in 
various subject areas. He was 
also an ardent political ac- 
tivist throughout his life. 
Russell was born in Trel- 
lech, Monmouthshire, Wales. 
His parents were members of 
the British nobility. They 
were free-thinkers, and even 
made friends with the radicals 
in Boston at the time. Unfor- 
tunately, Russell’s parents died when he was young, and Russell 
was sent to live with his grandparents. There, he was given a 
religious upbringing (something his parents had wanted to avoid 
at all costs). His grandmother was very strict in all matters of 


Fig. C.6: Bertrand Russell 
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morality. During adolescence he was mostly homeschooled by 
private tutors. 

Russell’s influence in analytic philosophy, and especially 
logic, is tremendous. He studied mathematics and philosophy at 
Trinity College, Cambridge, where he was influenced by the math- 
ematician and philosopher Alfred North Whitehead. In 1910, 
Russell and Whitehead published the first volume of Principia 
Mathematica, where they championed the view that mathematics 
is reducible to logic. He went on to publish hundreds of books, 
essays and political pamphlets. In 1950, he won the Nobel Prize 
for literature. 

Russell’s was deeply entrenched in politics and social ac- 
tivism. During World War [he was arrested and sent to prison for 
six months due to pacifist activities and protest. While in prison, 
he was able to write and read, and claims to have found the ex- 
perience “quite agreeable.” He remained a pacifist throughout 
his life, and was again incarcerated for attending a nuclear dis- 
armament rally in 1961. He also survived a plane crash in 1948, 
where the only survivors were those sitting in the smoking sec- 
tion. As such, Russell claimed that he owed his life to smoking. 
Russell was married four times, but had a reputation for carrying 
on extra-marital affairs. He died on February 2, 1970 at the age 
of 97 in Penrhyndeudraeth, Wales. 


Further Reading Russell wrote an autobiography in three 
parts, spanning his life from 1872-1967 (Russell, 1967, 1968, 
1969). The Bertrand Russell Research Centre at McMaster Uni- 
versity is home of the Bertrand Russell archives. See their website 
at Duncan (2015), for information on the volumes of his collected 
works (including searchable indexes), and archival projects. Rus- 
sell’s paper On Denoting (Russell, 1905) is a classic of 20th century 
analytic philosophy. 

The Stanford Encyclopedia of Philosophy entry on Russell 
(Irvine, 2015) has sound clips of Russell speaking on Desire and 
Political theory. Many video interviews with Russell are available 
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online. To see him talk about smoking and being involved in a 
plane crash, e.g., see Russell (n.d.). Some of Russell’s works, 
including his Introduction to Mathematical Philosophy are available 
as free audiobooks on LibriVox (n.d.). 


C.7 Alfred Tarski 


Alfred Tarski was born on 
January 14, 1901 in War 
saw, Poland (then part of the 
Russian Empire). Described 
as “Napoleonic,” Tarski was 
boisterous, talkative, and in- 
tense. His energy was often 
reflected in his lectures—he 
once set fire to a wastebasket 
while disposing of a cigarette 
during a lecture, and was for- 
bidden from lecturing in that 
building again. 

Tarski had a thirst for 
knowledge from a young age. 
Although later in life he would Fig. C.7: Alfred Tarski 
tell students that he studied 
logic because it was the only class in which he got a B, his high 
school records show that he got A’s across the board—even in 
logic. He studied at the University of Warsaw from 1918 to 1924. 
Tarski first intended to study biology, but became interested in 
mathematics, philosophy, and logic, as the university was the 
center of the Warsaw School of Logic and Philosophy. Tarski 
earned his doctorate in 1924 under the supervision of Stanislaw 
LeSniewski. 

Before emigrating to the United States in 1939, Tarski com- 
pleted some of his most important work while working as a sec- 
ondary school teacher in Warsaw. His work on logical conse- 
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quence and logical truth were written during this time. In 1939, 
Tarski was visiting the United States for a lecture tour. During 
his visit, Germany invaded Poland, and because of his Jewish her- 
itage, Tarski could not return. His wife and children remained in 
Poland until the end of the war, but were then able to emigrate to 
the United States as well. Tarski taught at Harvard, the College 
of the City of New York, and the Institute for Advanced Study 
at Princeton, and finally the University of California, Berkeley. 
There he founded the multidisciplinary program in Logic and 
the Methodology of Science. Tarski died on October 26, 1983 at 
the age of 82. 


Further Reading For more on Tarski’s life, see the biogra- 
phy Alfred Tarski: Life and Logic (Feferman and Feferman, 2004). 
Tarski’s seminal works on logical consequence and truth are avail- 
able in English in (Corcoran, 1983). All of Tarski’s original works 
have been collected into a four volume series, (Tarski, 1981). 


C.8 Alan Turing 


Alan Turing was born in Maida Vale, London, on June 23, 1912. 
He is considered the father of theoretical computer science. Tur- 
ing’s interest in the physical sciences and mathematics started at 
a young age. However, as a boy his interests were not represented 
well in his schools, where emphasis was placed on literature and 
classics. Consequently, he did poorly in school and was repri- 
manded by many of his teachers. 

Turing attended King’s College, Cambridge as an undergrad- 
uate, where he studied mathematics. In 1936 Turing developed 
(what is now called) the Turing machine as an attempt to pre- 
cisely define the notion of a computable function and to prove 
the undecidability of the decision problem. He was beaten to 
the result by Alonzo Church, who proved the result via his own 
lambda calculus. Turing’s paper was still published with reference 
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to Church’s result. Church invited Turing to Princeton, where he 
spent 1936-1938, and obtained a doctorate under Church. 

Despite his interest in 
logic, Turing’s earlier inter- 
ests in physical sciences re- 
mained prevalent. His prac- 
tical skills were put to work 
during his service with the 
British cryptanalytic depart- 
ment at Bletchley Park dur- 
ing World War II. Turing was 
a central figure in cracking 
the cypher used by German 
Naval communications—the 
Enigma code. Turing’s exper- 
tise in statistics and cryptog- 
raphy, together with the intro- fig. C8: Alan Turing 
duction of electronic machin- 
ery, gave the team the ability to crack the code by creating a de- 
crypting machine called a “bombe.” His ideas also helped in the 
creation of the world’s first programmable electronic computer, 
the Colossus, also used at Bletchley park to break the German 
Lorenz cypher. 

Turing was gay. Nevertheless, in 1942 he proposed to Joan 
Clarke, one of his teammates at Bletchley Park, but later broke off 
the engagement and confessed to her that he was homosexual. He 
had several lovers throughout his lifetime, although homosexual 
acts were then criminal offences in the UK. In 1952, Turing’s 
house was burgled by a friend of his lover at the time, and when 
filing a police report, Turing admitted to having a homosexual 
relationship, under the impression that the government was on 
their way to legalizing homosexual acts. This was not true, and 
he was charged with gross indecency. Instead of going to prison, 
Turing opted for a hormone treatment that reduced libido. Turing 
was found dead on June 8, 1954, of a cyanide overdose—most 
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likely suicide. He was given a royal pardon by Queen Elizabeth II 
in 20193. 


Further Reading For a comprehensive biography of Alan Tur- 
ing, see Hodges (2014). Turing’s life and work inspired a play, 
Breaking the Code, which was produced in 1996 for TV starring 
Derek Jacobi as Turing. The Imitation Game, an Academy Award 
nominated film starring Bendict Cumberbatch and Kiera Knight- 
ley, is also loosely based on Alan Turing’s life and time at Bletch- 
ley Park (Tyldum, 2014). 

Radiolab (2012) has several podcasts on Turing’s life and 
work. BBC Horizon’s documentary The Strange Life and Death 
of Dr. Turing is available to watch online (Sykes, 1992). (Theelen, 
2012) is a short video of a working LEGO Turing Machine— 
made to honour Turing’s centenary in 2012. 

Turing’s original paper on Turing machines and the decision 


problem is Turing (1937). 


C.g Ernst Zermelo 


Ernst Zermelo was born on July 27, 1871 in Berlin, Germany. 
He had five sisters, though his family suffered from poor health 
and only three survived to adulthood. His parents also passed 
away when he was young, leaving him and his siblings orphans 
when he was seventeen. Zermelo had a deep interest in the arts, 
and especially in poetry. He was known for being sharp, witty, 
and critical. His most celebrated mathematical achievements in- 
clude the introduction of the axiom of choice (in 1904), and his 
axiomatization of set theory (in 1908). 

Zermelo’s interests at university were varied. He took courses 
in physics, mathematics, and philosophy. Under the supervision 
of Hermann Schwarz, Zermelo completed his dissertation Inves- 
tigations in the Calculus of Variations in 1894 at the University of 
Berlin. In 1897, he decided to pursue more studies at the Univer- 
sity of Géttigen, where he was heavily influenced by the founda- 
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tional work of David Hilbert. In 1899 he became eligible for pro- 
fessorship, but did not get one until eleven years later—possibly 
due to his strange demeanour and “nervous haste.” 

Zermelo finally received a 
paid professorship at the Uni- 
versity of Zurich in 1910, but 
was forced to retire in 1916 
due to tuberculosis. After his 
recovery, he was given an hon- 
ourary professorship at the 
University of Freiburg in 1921. 
During this time he worked 
on foundational mathematics. 
He became irritated with the 
works of Thoralf Skolem and 
Kurt Gédel, and publicly crit- 
icized their approaches in his 
papers. He was dismissed 
from his position at Freiburg 
in 1935, due to his unpopular- 
ity and his opposition to Hitler’s rise to power in Germany. 

The later years of Zermelo’s life were marked by isolation. Af- 
ter his dismissal in 1935, he abandoned mathematics. He moved 
to the country where he lived modestly. He married in 1944, and 
became completely dependent on his wife as he was going blind. 
Zermelo lost his sight completely by 1951. He passed away in 
Giinterstal, Germany, on May 21, 1953. 


Fig. C.g: Ernst Zermelo 


Further Reading For a full biography of Zermelo, see Ebbing- 
haus (2015). Zermelo’s seminal 1904 and 1908 papers are avail- 
able to read in the original German (Zermelo, 1904, 1908). Zer- 
melo’s collected works, including his writing on physics, are avail- 
able in English translation in (Ebbinghaus et al., 2010; Ebbing- 
haus and Kanamori, 2013). 
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The Greek 
Alphabet 


Alpha 
Beta 
Gamma 
Delta 
Epsilon 
Zeta 
Eta 
Theta 
Iota 
Kappa 
Lambda 
Mu 


rTr~rr VDanrr D©r DWE 


SSR ORNAAT HE 


Nu 

Xi 
Omicron 
Pi 

Rho 
Sigma 
Tau 
Upsilon 
Phi 

Chi 

Psi 
Omega 


393 


eexseecrgqgvoaraenmn, 


OSH ARANMDAOCHEA 


Glossary 


anti-symmetric R is anti-symmetric iff, whenever both Rxy and 
Ryx, then x = y; in other words: if x # y then not Rxy 
or not Ryx (see section 2.2). 

assumption A formula that stands topmost in a derivation, also 
called an initial formula. It may be discharged or undis- 
charged (see section 11.1). 

asymmetric R is asymmetric if for no pair x,y € A we have Rxy 
and Ryx (see section 2.4). 


bijection A function that is both surjective and injective (see 
section 3.2). 

binary relation A subset of A’; we write Rxy (or xRy) for (x,y) € 
R (see section 2.1). 

bound Occurrence of a variable within the scope of a quantifier 
that uses the same variable (see section 6.8). 


Cartesian product (A x B) Set of all pairs of elements of A and 
B; Ax B= {(x,y):« € A and y € B} (see section 1.5). 

Church-Turing Theorem States that there is no Turing machine 
which decides if a given sentence of first-order logic is 
valid or not (see section 15.8). 

Church-Turing Thesis states that anything computable via 
an effective procedure is Turing computable (see sec- 
tion 14.10). 


394 


GLOSSARY 395 


closed A set of sentences I is closed iff, whenever [ - A then 
A eT. The set {A : I & A} is the closure of I (see 
section 8.1). 

compactness theorem States that every finitely satisfiable set of 
sentences is satisfiable (see section 12.9). 

complete consistent set A set of sentences is complete and con- 
sistent iff it is consistent, and for every sentence A either 
A or —A is in the set (see section 12.3). 

completeness Property of a derivation system; it is complete if, 
whenever J entails A, then there is also a derivation that 
establishes [’ + A; equivalently, iff every consistent set of 
sentences is satisfiable (see section 12.1). 

completeness theorem States that first-order logic is complete: 
every consistent set of sentences is satisfiable. 

composition (g o f) The function resulting from “chaining to- 
gether” f and g; (g° f)(x) = g(f(4)) (see section 3.5). 

connected R is connected if for all x,y € A with x # y, then 
either Rxy or Ryx (see section 2.2). 

consistent In the sequent calculus, a set of sentences I" is consis- 
tent iff there is no derivation of a sequent Jy) > with 
Io CT (see section 10.8). In natural deduction, I is con- 
sistent iff [ ¥ L (see section 11.7). If I is not consistent, 
it is inconsistent.. 

covered A structure in which every element of the domain is the 
value of some closed term (see section 7.2). 


decision problem Problem of deciding if a given sentence of first- 
order logic is valid or not (see Church-Iuring Theorem). 
deduction theorem Relates entailment and provability of a sen- 
tence from an assumption with that of a corresponding 
conditional. In the semantic form (Theorem 7.29), it 
states that [ U {A} & B iff [ & A — B. In the proof- 
theoretic form, it states that 7 U{A}+ Biff P+ AB. 
derivability (+ A) In the sequent calculus, A is derivable 
from I if there is a derivation of a sequent Ij = A where 
Io CT is a finite sequence of sentences in I’ (see sec- 
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tion 10.8). In natural deduction, A is derivable from I 
if there is a derivation with end-formula A and in which 
every assumption is either discharged or is in I” (see sec- 
tion 11.7). 

derivation In the sequent calculus, a tree of sequents in which 
every sequent is either an initial sequent or follows from 
the sequents immediately above it by a rule of inference 
(see section 10.1). In natural deduction, a tree of for 
mulas in which every formula is either an assumption or 
follows from the formulas immediately above it by a rule 
of inference (see section 11.1). 

difference (A \ B) the set of all elements of A which are not also 
elements of B: A\ B = {x : x € Aandx ¢ B} (see 
section 1.4). 

discharged An assumption in a derivation may be discharged by 
an inference rule below it (the rule and the assumption 
are then assigned a matching label, e.g., [A]”). If it is not 
discharged, it is called undischarged (see section 11.1). 

disjoint two sets with no elements in common (see section 1.4). 

domain (of a function) (dom(/f)) The set of objects for which 
a (partial) function is defined (see section 3.1). 

domain (of a structure) (|M|) Non-empty set from from which a 
structure takes assignments and values of variables (see 
section 7.2). 


eigenvariable In the sequent calculus, a special constant sym- 
bol in a premise of a SL or VR inference which may 
not appear in the conclusion (see section 10.1). In nat- 
ural deduction, a special constant symbol in a premise 
of a SElim or VIntro inference which may not appear 
in the conclusion or any undischarged assumption (see 
section 11.1). 

entailment ([ — A) A set of sentences I entails a sentence A 
iff for every structure M with M + T, M & A (see sec- 


tion 7.7). 
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enumeration A possibly infinite list of all elements of a set A; for- 
mally a surjective function f: N — A (see section 4.2). 

equinumerous A and B are equinumerous iff there is a total bi- 
jection from A to B (see section 4.8). 

equivalence relation a reflexive, symmetric, and transitive rela- 
tion (see section 2.2). 

extensionality (of satisfaction) Whether or not a formula A is 
satisfied depends only on the assignments to the non- 
logical symbols and free variables that actually occur 
in A. 

extensionality (of sets) Sets A and B are identical, A = B, iff 
every element of A is also an element of B, and vice 
versa (see section 1.1). 


finitely satisfiable / is finitely satisfiable iff every finite Jy) C 
is satisfiable (see section 12.9). 

formula Expressions of a first-order language & which express 
relations or properties, or are true or false (see sec- 
tion 6.3). 

free An occurrence of a variable that is not bound (see sec- 
tion 6.8). 

free for A term ¢ is free for x in A if none of the free occurrences 
of x in A occur in the scope of a quantifier that binds a 
variable in ¢ (see section 6.9). 

function (f: A — B) A mapping of each element of a domain 
(of a function) A to an element of the codomain B (see 
section 3.1). 


graph (of a function) the relation Ry C A x B defined by Rr = 
{(x,y) : f(x) = y}, if f: A + B (see section 3.3). 


halting problem The problem of determining (for any e, n) 
whether the Turing machine M, halts for an input of n 
strokes (see section 15.4). 


inconsistent see consistent. 
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injective f: A — B is injective iff for each y € B there is at most 
one x € A such that f(x) = y; equivalently if whenever 
x #x’ then f(x) # f(x’) (see section 3.2). 

intersection (A B) The set of all things which are elements 
of both A and B: AN B= {x : x € AAx © B} (see 
section 1.4). 

inverse function If f: A — B is a bijection, f-!: B > A is the 
function with f~1(y) = whatever unique x € A is such 
that f(x) = y (see section 3.4). 

inverse relation (R~!) The relation R “turned around”; R=! = 
{(y,x) : (x,y) € R} (see section 2.6). 

irreflexive R is irreflexive if, for no x € A, Rxx (see section 2.4). 


Léwenheim-Skolem Theorem States that every satisfiable set 
of sentences has a countable model (see section 12.11). 
linear order A connected partial order (see section 2.4). 


model A structure in which every sentence in J is true is a model 
of I (see section 8.2). 


partial function (f: A + B) A partial function is a mapping 
which assigns to every element of A at most one element 
of B. If f assigns an element of B to x € A, f(x) is 
defined, and otherwise undefined (see section 3.6). 

partial order A reflexive, anti-symmetric, transitive relation (see 
section 2.4). 

power set (g(A)) The set consisting of all subsets of a set A, 
(A) = {x : x C A} (see section 1.2). 

preorder A reflexive and transitive relation (see section 2.4). 


range (ran(f)) the subset of the codomain that is actually output 
by f; ran(f) = {y € B: f(x) = y for some x € A} (see 
section 3.1). 

reflexive R is reflexive iff, for every x € A, Rxx (see section 2.2). 


satisfiable A set of sentences I is satisfiable if M & I for 
some structure M, otherwise it is unsatisfiable (see sec- 
tion 7.7). 
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sentence A formula with no free variable. (see section 6.8). 

sequence (finite) (A*) A finite string of elements of A; an ele- 
ment of A” for some n (see section 1.3). 

sequence (infinite) (A”) A gapless, unending sequence of el- 
ements of A; formally, a function s: Z* — A (see sec- 
tion 1.3). 

sequent An expression of the form [ => A where I and 4 are 
finite sequences of sentences (see section 10.1). 

set A collection of objects, considered independently of the way 
it is specified, of the order of the objects in the set, and 
of their multiplicity (see section 1.1). 

soundness Property of a derivation system: it is sound if when- 
ever [ + A then JT ¢£ A (see section 10.12 and sec- 
tion 11.11). 

strict linear order A connected strict order (see section 2.4). 

strict order An irreflexive, asymmetric, and transitive relation 
(see section 2.4). 

structure (M) An interpretation of a first-order language, con- 
sisting of a domain (of a structure) and assignments of 
the constant, predicate and function symbols of the lan- 
guage (see section 7.2). 

subformula Part of a formula which is itself a formula (see sec- 
tion 6.6). 

subset (A C B) A set every element of which is an element of a 
given set B (see section 1.2). 

surjective f: A — B is surjective iff the range of f is all of B, 
i.e., for every y € B there is at least one x € A such 
that f(x) = y (see section 3.2). 

symmetric R is symmetric iff, whenever Rxy then also Ryx (see 
section 2.2). 


theorem (+ A) In the sequent calculus, a formula A is a theorem 
(of logic) if there is a derivation of the sequent > A 
(see section 10.8). In natural deduction, a formula A is a 
theorem if there is a derivation of A with all assumptions 
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discharged (see section 11.7). We also say that A is a 
theorem of a theory I if [+ A. 

total order see linear order. 

transitive R is transitive iff, whenever Rxy and Ryz, then also 
Rxz (see section 2.2). 

transitive closure (R*) the smallest transitive relation contain- 
ing R (see section 2.6). 


undischarged see discharged. 
union (AUB) The set of all elements of A and B together: AUB = 
{x:x €AV «x © B} (see section 1.4). 


valid (+ A) A sentence A is valid iff M + A for every structure M 
(see section 7.7). 

variable assignment A function which maps each variable to an 
element of |M| (see section 7.4). 


x-variant Two variable assignments are x-variants, s ~, s’, if they 
differ at most in what they assign to x (see section 7.4). 
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dated, photographer unknown. Alonzo Church Papers; 1924— 
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rapher unknown. From the Shelby White and Leon Levy Archives 
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courtesy of the Abteilung fiir Handschriften und Seltene Drucke, 
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Cod. Ms. D. Hilbert 754, Bl. 6 Nr. 25. 
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